DIGITAL SIGNAL PROCESSORS

DIGITAL SIGNAL PROCESSORS

Mukesh. C. Motwani

Department of Electronics

Vishwakarma Institute of Technology,Pune 411037

01/99

INTRODUCTION:

We live in a digital world. Digital signal Processing(DSP) technology is nowadays common place in devices such as mobile phones, multimedia computers, video recorders, CD players, hard disc drive controllers and modems, and will soon replace analog circuitry in commercial TV sets and telephones. An important application of DSP is in signal compression and decompression.

The development of DSP dates from the 1960's with the use of mainframe digital computers for number-crunching applications. These

techniques were not widely used at that time, because suitable computing

equipment was available only in universities and other scientific research institutions. A revolution began in the late 70's due to the introduction of

a device called the Digital Signal Processor. The digital signal processor is

a digital electronic device that plays a heavy emphasis on performing the

most fundamental DSP operation, namely multiply-accumulate at high

speeds. General-purpose microprocessors such as the Intel x86 family are not ideally suited to the numerically-intensive requirements of DSP, and

during the 1980's the increasing importance of DSP led several major

electronics manufacturers such as Texas Instruments, Analog Devices and

Motorola to develop Digital Signal Processor chips - specialized

microprocessors with architectures designed specifically for the types

of operations required in DSP.

Like a general-purpose microprocessor, a DSP chip is a programmable device, with its own native instruction code. DSP chips are capable of carrying out millions of floating point operations per second, and faster and more powerful versions are continually being introduced. Thus, this technological revolution continues today & has become a primary force in making DSP an important commercial technology. The future of DSP is in some sense limited only by human creativity & our ability to successfully apply DSP to original problems!

DIGITAL SIGNAL PROCESSORS

Digital Signal Processing(DSP) is, essentially, the application of mathematical operations to a series of digital samples representing physical world signals such as audio waves, or complex CAT-scan sensor samples.

What is a DSP system? A digital signal processor accepts one or more discrete-time inputs and produces one or more outputs as shown in the block diagram. A low-pass filter is put in series with the input to prevent aliasing. A Sample and Hold circuitry is used to keep value on the input of the Analog-to-Digital(ADC) converter constant during conversion, as the data is continuously changing. The ADC now passes its digitized signal information to a DSP, which does the bulk of the processing. In some situations the data is already in digital form - such as the output from the readout system of a CD (compact disc) player. Eventually, when the DSP has finished its processing, the digital data may be turned back into an analog signal using a Digital-to-Analog(DAC) converter, although one that is quite. The operation of the processor on the input samples could be linear or nonlinear, time-invariant or time-varying, depending on the application of interest. For instance, a DSP can filter the noise from a signal, remove unwanted interference, amplify certain frequencies and suppress others, encrypt information, or analyze a complex wave form into its spectral components. A low-pass filter is used to smooth the analog output.

Fixed Point versus Floating Point DSP chips:DSP chip word size determines resolution and dynamic range. In the fixed point processors, a linear relationship exists between word size and dynamic range. The floating point chip perform integer or real arithmetic. Normally, floating point DSP formats are 32 data bits wide and in which 24 bits form the mantissa and 8 bits make up the exponent. The fixed point DSPs are either

16 or 24data bits wide. In general, the fixed point DSPs are low cost and

designers tend to use them high volume embedded applications. These devices are the obvious choice fir cellular phones, modems, and hard discs. In contrast, the floating point chips are being used in system level controls where performance ,not cost, is the main concern. The floating point calculations are indispensable in graphics and imaging.

DSP versus General Microprocessors: DSPs differ from microprocessors in a number of ways. Microprocessors are typically built for a range of general purpose functions, and normally run large blocks of software, such as operating systems like UNIX. Microprocessors aren't often called upon for real-time computation. And though microprocessors

have some numeric capabilities, they're nowhere near fleet enough for most DSP applications. DSP chips are primary designed for real-time number crunching applications. They have dual(data and program) memories, sophisticated address generators, efficient external interfaces for I/O, along with powerful functional units such as the adder, barrel shifter, and a dedicated hardware multiplier, together with fast registers. General-purpose microprocessors besides lacking a hardware multiplier and taking several tens of clock cycles to compute a single multiply, also lack the high memory bandwidth, low power dissipation, and real time I/O capabilities of DSP chips, and their cost advantages.

DSP ARCHITECTURE

Figure shows the RTL architecture for a digital signal processor. The program bus and the data bus are separate from each other, as are the program and data memories. These parallel buses allow instruction and data to be fetched at the same time. This separation of data and program busses characterizes the so-called HARWARD ARCHITECTURE. The program counter(PC) and the index registers are used in computing the addresses of the next instruction and the coefficients. The instruction is decoded by the instruction register(IR), where the address of the data is calculated using the adder and the base index register provided with the data memory. An ALU, multiplier and a register file constitute a multiply-accumulate(MAC) structure, which is a basic building block. A 32-bit barrel shifter, which can shift an operand any number of bits in one clock cycle is provided to allow incorporation of multiple wordlengths within the data path, and the data and program busses. A 16-bit or a 32-bit CPU which maintains precision during the chain calculations need in most digital filters applications. Sizable amounts of on-chip registers, ROM, EEPROM and RAM, are provided so that data and instructions can be accessed very quickly. Input samples are read in front the input buffer/ and written into the output buffer. The chip can interface to a host computer via the external interface.

Analog Devices has introduced Super Harward Architecture(SHARC) which delivers the highest achievable performance for a large class of signal processing algorithms, it has the right features for I/O and inter-processor communication, and the packaging and power consumption are suitable for dense system integration.

Vendors are focusing on several key aspects of new DSP architectures. An extreme example of parallelism is Texas Instruments' C6x very-long-instruction-word(VLIW) DSP with eight parallel functional units. Although Analog Devices' Super Harvard Architecture(SHARC) can perform as many as seven operations per cycle, the company and other vendors are working feverishly to develop their own "VLIW-ized" DSP chips. In contrast to superscalar architectures, VLIW simplifies a DST’s control logic by providing independent control for each processing unit.

Some DSP devices make provision for the repetition of instructions without the need for software loops and consequent instruction fetch overheads, for instance TMS320C30 has a 'zero-overhead' loop instruction which can be used to quickly repeat an operation some number of times.

Addressing Modes: Addressing modes are various ways of specifying the data. Operands are stored either in the register files or in the memory(on-chip & off-chip).Their addresses are present either directly in the instruction, or indirectly in a CPU register. In either case, the addressing mode should declare if the operand is in the registers or in the memory and provide its address.

Immediate Addressing: In this mode the data is contained in the instruction itself.

Direct Addressing: The address of the operand in the memory is obtained from the least significant bits(LSBs) of the instruction. In the TMS320C30, the 16 LSBs of the instruction are concatenated with the 8 LSBs of the Data Page(DP) pointer.

Indirect Addressing: The location of the operand in the memory is pinpointed through a combination of the contents of an auxiliary register, optional displacements, and through the index registers available.The auxiliary addressing register units (AARUs) are functional units that calculate the effective address of the operand. This technique is particularly useful when blocks of data are being processed since provision is made for automatically incrementing or decrementing the address stored in the register following each reference.

Register Addressing: In this case the instruction specifies the CPU/ALU/DP(Data Page) register in which the data is stored.

Parallel Addressing: Instructions that allow the concurrent use of functional units are specified by the parallel addressing mode. In this mode, the four operands could be derived from two registers and two memory locations. In case the two operands are stored in separate memories, they can use separate data buses to get to the functional units. The other two operands have to accessed from the register files. The result is written into the register files as well.

Circular Addressing: One of the specialized addressing modes available for signal processing applications is that of circular addressing. In most real-time signal processing applications, such as those found in filtering, the input is an infinite stream of data samples. These samples are windowed and used in filtering applications. The data samples simulate a tapped-delay line and the oldest sample is written over by the most recent sample. The filter coefficients and the data samples are written into two circular buffers. Then, they are multiplied and accumulated together to form the output sample result, which is then stored. The address pointer for the data buffer is then updated and the samples appear shifted by one sample period, the oldest data being written out, and the most recent data is written in into that location.

Bit -Reversed Addressing: In the computation of FFT using butterfly-based

algorithms, the addresses of the outputs are bit-reversed with respect to the outputs. Many signal processors restore the original order of the outputs without incurring additional cycles. A modulo addressing feature is also provided.

DSP Software Development: DSP software development is very much a matter of fully exploiting the parallel processing capabilities inherent in the architecture and instruction set of the particular device being used.

High Level Languages such as C have no caught on in a significant way in

the DSP context. HLLs lack support for needed features such as Harvard architecture, circular buffers and the like. The disadvantage to C lies in the fact that C was not designed for numeric types and operators and its standard math libraries are not robust. A number of extensions have then be added to form DSP/C, which is a superset of ANSI C. Other parallel processing languages such as Silage may also used. DSP applications developers prefer

to use a HLL as opposed to an assembler. VHDL executable specifications can be written to simulate the performance of the DSP/C algorithms.

MULTIPROCESSING:

Current DSP algorithms typically require processing rates of 1-30 Million Instructions per Second(MIPS), while video coding/

decoding, medical imaging, missile guiding algorithms may require from 0.1-10 billion instructions per second. Since DSP algorithms permit a very high degree of parallelism, DSP chips can be used for super-computing.

The algorithm is expressed as fully specified flow graphs & then its

performance analyzed. The requisite no. of processors are then chosen, & the communication structure determined. Every cycle of every processor is devoted directly to the arithmetic implementation of the algorithm.Once the data areas & the registers are initialized, the processors commence synchronous operations, maintaining correct data precedence.

To reduce the complexity of search algorithm, all operands & results can be fetched from adjacent processors or from local storage & interprocessor communications between adjacent processor pairs are independent of all other processor communications.

The system may be composed of constituent processor boards, which are self contained processor boards & communications motherboard which provide memory level connectivity to form a simple synchronous bi-directional ring or a regular square array. Typical data flow configurations include pipelined & star architectures. Pipelined architecture has the data flowing linearly through the processor chain. Star architecture uses multiple processors clustered around a master processor that controls the data flow.

Which system bus is used? The two host system bus environments typically used for today's DSP systems are VME and PCI. Traditionally,

high performance multiprocessor DSP systems have been built around the VME bus because of its legacy of performance, robustness, ruggedness, and flexibility. In recent years, PCI bus systems have gained significant acceptance. This is driven largely by their options for lower-cost implementations.

DSP chip manufacturers are providing effective multiprocessing

communication tools such as the TMS320C40 & ADSP Z3 with 6 parallel

communication ports on chip & with the necessary logic to provide a zero-chip interface between pairs of these ports! The single most important factor limiting their performance has been the development of a sophisticated software compiling & synchronous scheduling environment. Thus , although multiprocessing can muster sufficient processing power in most cases, but the multiple chips often require more complex programming, consume more power, occupy more board space & often duplicate on-chip resources.

Coming to the rescue are more highly integrated solutions that combine

control & DSP functions on the same chip, saving power, space, resources

& simplifying programming. These include microcontroller with some DSP

computational support added to architecture or conversely ,DSP circuits with controller features & peripherals, highly integrated chips with embedded controller &DSP co-resident on the same chip, fully merged DSP & controller architectures into a single processing engine, & full custom solutions assembled with design library building blocks.

The chip can interface to a host computer via the External Interface Unit(EIU).The EIU is a multiplexer that allows access to and from memory. In recent versions of programmable processors, EIUs are quite sophisticated and routinely use DMA controllers, communication ports,bus arbiters and serial and parallel I/O ports. Pipelining is one of the most important methods for improving the performance of the processor.

CONCLUSION

The recently-introduced Texas Instruments TMS320C8x and the

well-established Analog Devices ADSP-2106x SHARC processors are the

two highest-performance, floating-point DSPs on the market today. Which

of these two processors provides the highest system performance depends

on the kind of task one is trying to perform. Although devices with wide

instruction words, dual memory, instructions with maximum scope for parallel processing, and short machine cycle times will generally give best performance, it is often the case that the critical factor is the availability of a particular instruction, which is available and fully exploited can make an otherwise unimpressive (and low cost) device the optimal choice.

Assuming that the application can run from internal memory, a 'C67x is more efficient in managing a single high-bandwidth stream than the

SHARC due to the high performance of the external memory interface

(EMIF). An image recognition system, for example, may have a high-

bandwidth pixel stream that is processed by a small correlation kernel

residing in internal memory.

The SHARC, on the other hand is more effective using its DMA resources to manage multiple medium-bandwidth channels, assuming the data is available on the link ports. In applications where I/O data transfers from the I/O port to local DSP memory are interleaved with processor data accesses (local memory to internal registers), there is a trade-off between data-block size and real-time response, no matter which DSP is selected.

Apart from speed the other major performance issues are memory size

(on-board and externally addressable), ease of handling multiple precision

and floating-point arithmetic(in particular, how wide is the data word and what special instructions are provided), and what peripheral devices such as serial and parallel interfaces, timers and multi-processing capability are available on-chip.

The next stage in the DSP evolution is using embedded digital signal

processors in specific applications. In computer systems, DSPs may be employed as attached processors, assisting a general purpose host microprocessor. The DSP market is fast expanding and is expected to reach one billion US dollars within a couple of years, by when DSP chips are likely to be present in every home appliance! So, it is really high time that the microprocessors system designers start using digital signal processors in their applications.