DIGITAL SIGNAL PROCESSORS
Mukesh. C. Motwani
Department of Electronics
Vishwakarma Institute of Technology,Pune 411037
01/99
INTRODUCTION:
We live in a digital world. Digital signal Processing(DSP)
technology is nowadays common place in devices such as mobile phones,
multimedia computers, video recorders, CD players, hard disc drive controllers
and modems, and will soon replace analog circuitry in commercial TV sets and
telephones. An important application of DSP is in signal compression and
decompression.
The development of DSP dates from the 1960's with the use of
mainframe digital computers for number-crunching applications. These
techniques were not widely
used at that time, because suitable
computing
equipment was available only in universities and other scientific research institutions. A revolution began in the late 70's due to the introduction of
a device called the Digital
Signal Processor. The digital signal processor is
a digital electronic device
that plays a heavy emphasis on performing the
most fundamental DSP
operation, namely multiply-accumulate at high
speeds. General-purpose
microprocessors such as the Intel x86 family are not ideally suited to the
numerically-intensive requirements of DSP, and
during the 1980's the
increasing importance of DSP led several major
electronics manufacturers such
as Texas Instruments, Analog Devices and
Motorola to develop Digital
Signal Processor chips - specialized
microprocessors with
architectures designed specifically for the types
of operations required in DSP.
Like a general-purpose microprocessor, a DSP chip is a
programmable device, with its own native instruction code. DSP chips are
capable of carrying out millions of floating point operations per second, and
faster and more powerful versions are continually being introduced. Thus, this
technological revolution continues today & has become a primary force in
making DSP an important commercial technology. The future of DSP is in some
sense limited only by human creativity & our ability to successfully apply
DSP to original problems!
DIGITAL
SIGNAL PROCESSORS
Digital Signal Processing(DSP) is,
essentially, the application of mathematical operations to a series of digital
samples representing physical world signals such as audio waves, or complex
CAT-scan sensor samples.
What
is a DSP system? A digital signal processor
accepts one or more discrete-time inputs and produces one or more outputs as
shown in the block diagram. A low-pass filter is put in series with the input
to prevent aliasing. A Sample and Hold circuitry is used to keep value on the
input of the Analog-to-Digital(ADC) converter constant during conversion, as
the data is continuously changing. The ADC
now passes its digitized signal information to a DSP, which does the
bulk of the processing. In some situations the data is already in digital form
- such as the output from the readout system of a CD (compact disc) player.
Eventually, when the DSP has finished its processing, the digital data may be
turned back into an analog signal using a Digital-to-Analog(DAC) converter,
although one that is quite. The operation of the processor on the input samples
could be linear or nonlinear, time-invariant or time-varying, depending on the
application of interest. For instance, a DSP can filter the noise from a
signal, remove unwanted interference, amplify certain frequencies and suppress
others, encrypt information, or analyze a complex wave form into its spectral
components. A low-pass filter is used to smooth the analog output.
Fixed
Point versus Floating Point DSP chips:DSP chip word size determines resolution and dynamic
range. In the fixed point processors, a linear relationship exists between word
size and dynamic range. The floating point chip perform integer or real
arithmetic. Normally, floating point DSP formats are 32 data bits wide and in
which 24 bits form the mantissa and 8 bits make up the exponent. The fixed
point DSPs are either
16 or 24data
bits wide. In general, the fixed point DSPs are low cost and
designers tend
to use them high volume embedded applications. These devices are the obvious
choice fir cellular phones, modems, and hard discs. In contrast, the floating
point chips are being used in system level controls where performance ,not
cost, is the main concern. The floating point calculations are indispensable in
graphics and imaging.
DSP versus General
Microprocessors:
DSPs differ from microprocessors in a number of ways. Microprocessors are
typically built for a range of general purpose functions, and normally run
large blocks of software, such as operating systems like UNIX. Microprocessors
aren't often called upon for real-time computation. And though microprocessors
have some numeric
capabilities, they're nowhere near fleet enough for most DSP applications. DSP
chips are primary designed for real-time number crunching applications. They
have dual(data and program) memories, sophisticated address generators,
efficient external interfaces for I/O, along with powerful functional units
such as the adder, barrel shifter, and a dedicated hardware multiplier,
together with fast registers.
General-purpose microprocessors besides lacking a hardware multiplier
and taking several tens of clock cycles to compute a single multiply, also lack
the high memory bandwidth, low power dissipation, and real time I/O
capabilities of DSP chips, and their cost advantages.
DSP ARCHITECTURE
Figure shows the
RTL architecture for a digital signal processor. The program bus and the data
bus are separate from each other, as are the program and data memories. These
parallel buses allow instruction and data to be fetched at the same time. This
separation of data and program busses characterizes the so-called HARWARD ARCHITECTURE. The program counter(PC) and the
index registers are used in computing the addresses of the next instruction and
the coefficients. The instruction is decoded by the instruction register(IR),
where the address of the data is calculated using the adder and the base index
register provided with the data memory. An ALU, multiplier and a register file
constitute a multiply-accumulate(MAC) structure, which is a basic building
block. A 32-bit barrel shifter, which can shift an operand any number of bits
in one clock cycle is provided to allow incorporation of multiple wordlengths
within the data path, and the data and program busses. A 16-bit or a 32-bit CPU
which maintains precision during the chain calculations need in most digital
filters applications. Sizable amounts of on-chip registers, ROM, EEPROM and
RAM, are provided so that data and instructions can be accessed very quickly.
Input samples are read in front the input buffer/ and written into the output
buffer. The chip can interface to a host computer via the external interface.
Analog Devices has introduced Super Harward Architecture(SHARC)
which delivers the highest achievable performance for a large class of signal
processing algorithms, it has the right features for I/O and inter-processor
communication, and the packaging and power consumption are suitable for dense
system integration.
Vendors are focusing on several key aspects of new DSP
architectures. An extreme example of parallelism is Texas Instruments' C6x
very-long-instruction-word(VLIW) DSP with eight parallel functional units.
Although Analog Devices' Super Harvard Architecture(SHARC) can perform as many
as seven operations per cycle, the company and other vendors are working
feverishly to develop their own "VLIW-ized" DSP chips. In contrast to
superscalar architectures, VLIW simplifies a DST’s control logic by providing
independent control for each processing unit.
Some DSP devices make
provision for the repetition of instructions without the need for software
loops and consequent instruction fetch overheads, for instance TMS320C30 has a
'zero-overhead' loop instruction which can be used to quickly repeat an
operation some number of times.
Addressing Modes: Addressing modes are various
ways of specifying the data. Operands are stored either in the register files
or in the memory(on-chip & off-chip).Their addresses are present either
directly in the instruction, or indirectly in a CPU register. In either case,
the addressing mode should declare if the operand is in the registers or in the
memory and provide its address.
Immediate
Addressing: In
this mode the data is contained in the instruction itself.
Direct Addressing: The address of the operand in
the memory is obtained from the least significant bits(LSBs) of the
instruction. In the TMS320C30, the 16 LSBs of the instruction are concatenated
with the 8 LSBs of the Data Page(DP) pointer.
Indirect Addressing:
The location of
the operand in the memory is pinpointed through a combination of the contents
of an auxiliary register, optional displacements, and through the index
registers available.The auxiliary addressing register units (AARUs) are
functional units that calculate the effective address of the operand. This
technique is particularly useful when blocks of data are being processed since
provision is made for automatically incrementing or decrementing the address
stored in the register following each reference.
Register
Addressing: In
this case the instruction specifies the CPU/ALU/DP(Data Page) register in which
the data is stored.
Parallel
Addressing: Instructions
that allow the concurrent use of functional units are specified by the parallel
addressing mode. In this mode, the four operands could be derived from two
registers and two memory locations. In case the two operands are stored in
separate memories, they can use separate data buses to get to the functional
units. The other two operands have to accessed from the register files. The
result is written into the register files as well.
Circular
Addressing: One
of the specialized addressing modes available for signal processing
applications is that of circular addressing.
In most real-time signal processing applications, such as those found in
filtering, the input is an infinite stream of data samples. These samples are
windowed and used in filtering applications. The data samples simulate a
tapped-delay line and the oldest sample is written over by the most recent
sample. The filter coefficients and the data samples are written into two
circular buffers. Then, they are multiplied and accumulated together to form
the output sample result, which is then stored. The address pointer for the
data buffer is then updated and the samples appear shifted by one sample
period, the oldest data being written out, and the most recent data is written
in into that location.
Bit -Reversed
Addressing: In
the computation of FFT using butterfly-based
algorithms, the addresses of
the outputs are bit-reversed with respect to the outputs. Many signal
processors restore the original order of the outputs without incurring
additional cycles. A modulo addressing feature is also provided.
DSP Software Development: DSP software development is
very much a matter of fully exploiting the parallel processing capabilities
inherent in the architecture and instruction set of the particular device being
used.
High Level Languages such as C
have no caught on in a significant way in
the DSP context. HLLs lack
support for needed features such as Harvard architecture, circular buffers and
the like. The disadvantage to C lies in the fact that C was not designed for
numeric types and operators and its standard math libraries are not robust. A
number of extensions have then be added to form DSP/C, which is a superset of
ANSI C. Other parallel processing languages such as Silage may also used. DSP
applications developers prefer
to use a HLL as opposed to an
assembler. VHDL executable specifications can be written to simulate the
performance of the DSP/C algorithms.
MULTIPROCESSING:
Current DSP algorithms typically require processing rates of
1-30 Million Instructions per Second(MIPS), while video coding/
decoding, medical imaging,
missile guiding algorithms may require from 0.1-10 billion instructions per
second. Since DSP algorithms permit a very high degree of parallelism, DSP
chips can be used for super-computing.
The algorithm is expressed as fully specified flow graphs &
then its
performance analyzed. The
requisite no. of processors are then chosen, & the communication structure
determined. Every cycle of every processor is devoted directly to the
arithmetic implementation of the algorithm.Once the data areas & the
registers are initialized, the processors commence synchronous operations,
maintaining correct data precedence.
To reduce the complexity of search algorithm, all operands
& results can be fetched from adjacent processors or from local storage
& interprocessor communications between adjacent processor pairs are
independent of all other processor communications.
The system may be composed of constituent processor boards,
which are self contained processor boards & communications motherboard
which provide memory level connectivity to form a simple synchronous
bi-directional ring or a regular square array. Typical data flow configurations
include pipelined & star architectures. Pipelined architecture has the data
flowing linearly through the processor chain. Star architecture uses multiple
processors clustered around a master processor that controls the data flow.
Which system bus is used? The two host system bus environments
typically used for today's DSP systems are VME
and PCI. Traditionally,
high performance
multiprocessor DSP systems have been built around the VME bus because of its
legacy of performance, robustness, ruggedness, and flexibility. In recent
years, PCI bus systems have gained significant acceptance. This is driven
largely by their options for lower-cost implementations.
DSP chip manufacturers are providing effective multiprocessing
communication tools such as
the TMS320C40 & ADSP Z3 with 6 parallel
communication ports on chip
& with the necessary logic to provide a zero-chip interface between pairs
of these ports! The single most important factor limiting their performance has been the development
of a sophisticated software compiling
& synchronous scheduling environment. Thus , although
multiprocessing can muster sufficient processing power in most cases, but the
multiple chips often require more complex programming, consume more power,
occupy more board space & often duplicate on-chip resources.
Coming to the rescue are more highly integrated solutions that
combine
control & DSP functions on
the same chip, saving power, space, resources
& simplifying programming.
These include microcontroller with some DSP
computational support added to
architecture or conversely ,DSP circuits with controller features &
peripherals, highly integrated chips with embedded controller &DSP
co-resident on the same chip, fully merged DSP & controller architectures
into a single processing engine, & full custom solutions assembled with
design library building blocks.
The chip can interface to a host computer via the External
Interface Unit(EIU).The EIU is a multiplexer that allows access to and from
memory. In recent versions of programmable processors, EIUs are quite
sophisticated and routinely use DMA controllers, communication ports,bus
arbiters and serial and parallel I/O ports. Pipelining is one of the most
important methods for improving the performance of the processor.
CONCLUSION
The recently-introduced Texas Instruments TMS320C8x and the
well-established Analog
Devices ADSP-2106x SHARC processors are the
two highest-performance,
floating-point DSPs on the market today. Which
of these two processors
provides the highest system performance depends
on the kind of task one is
trying to perform. Although devices with wide
instruction words, dual
memory, instructions with maximum scope for parallel processing, and short
machine cycle times will generally give best performance, it is often the case
that the critical factor is the availability of a particular instruction, which
is available and fully exploited can make an otherwise unimpressive (and low
cost) device the optimal choice.
Assuming that the application can run from internal memory, a
'C67x is more efficient in managing a single high-bandwidth stream than the
SHARC due to the high
performance of the external memory interface
(EMIF). An image recognition
system, for example, may have a high-
bandwidth pixel stream that is
processed by a small correlation kernel
residing in internal memory.
The SHARC, on the other hand is more effective using its DMA
resources to manage multiple medium-bandwidth channels, assuming the data is
available on the link ports. In applications where I/O data transfers from the
I/O port to local DSP memory are interleaved with processor data accesses
(local memory to internal registers), there is a trade-off between data-block
size and real-time response, no matter which DSP is selected.
Apart from speed the other major performance issues are memory
size
(on-board and externally
addressable), ease of handling multiple precision
and floating-point
arithmetic(in particular, how wide is the data word and what special
instructions are provided), and what peripheral devices such as serial and
parallel interfaces, timers and multi-processing capability are available
on-chip.
The next stage in the DSP evolution is using embedded digital
signal
processors in specific
applications. In computer systems, DSPs may be employed as attached processors,
assisting a general purpose host microprocessor. The DSP market is fast
expanding and is expected to reach one billion US dollars within a couple of
years, by when DSP chips are likely to be present in every home appliance! So,
it is really high time that the microprocessors system designers start using
digital signal processors in their applications.