4 Hardware Architecture (continued)
4.2 DSP16000 Core A rchitectural Overview
The DSP16411 cont ains two identi cal DSP1600 0
cores. As shown in F igure 2 on page 21, each core
consists of four major blocks : syst em c ontrol and cache
(SYS), data arithmetic unit (DAU), Y-memo ry space
address arithmetic unit (YAAU), and X-memory space
address arithmetic unit (XAAU). Bits within the auc0
and auc1 registers configure the DAU mode-controlled
operations. S ee t he
D SP160 00 D igital Signal Proces-
sor Core
Information Manual for a complete description
of the DSP16000 core.
4.2.1 System Control and Cache (SYS)
This section consists of the control block and the
cache.
The control block provides overall system coordination
that is most ly invisible to the user. The cont rol block
includes an instructi on decoder and sequencer, a
pseudorandom sequence generator (PSG), an inter-
rupt and trap handler, a wait-state generator , and low-
power standby mode control logic. An interrupt and trap
handler provides a user-locatable vecto r table and
three le vels of user-assigned interrupt priority.
SYS contains the alf register, which is a 16-bit register
that contains AWAIT, a power-saving standby mode
bit, and peripheral flags. The inc0 and inc1 registers
are 20-bit interrupt control registers, and ins is a 20-bit
interrupt sta tus register.
Programs use the instruction cache to store and exe-
cute repetitive operations such as those found in an
FIR or IIR filter section. The cache can contain up to
thirty-one 16-bit and 32-bit instructions. The code in the
cache can repeat up to 216 – 1 times without looping
overhead. Operations in the cac he that require a coeffi-
cient access ex ecute at twice the normal rate because
th e XAAU and i ts as so ciated bus are not needed for
fetching instructions. The cache greatly reduces the
need for writing in-line repetitive code and, therefore,
reduces instruction/coeff icient memory size require-
ments. In addition, the use of cache reduces power
consum ption becaus e it eliminates memo ry accesse s
for instruction fetches.
The cache provides a convenient, low-overhead loop-
ing structure that is interruptible, savable, and restor-
able. The cache is addressable in both the X and Y
memory spaces . An interrupt or trap handling routine
can save and restore cloop, cstate, csave, and the
contents of the cache. The cloop register controls the
cache loop count. The cstate register contains the cur -
rent state of the cache. The 32-bit csave register holds
the opcode of the instruction following the loop instruc-
tion in program memory.
4.2.2 Data Arithmetic Unit (DAU)
The DAU is a power-effici ent, dual-MAC (multiply/accu-
mulate), parallel-pipelined structure that is tailored to
comm unicat ions appli cations . It can perform two dou-
ble-word (32-bit) fetches, two multiplications, and two
accumulations in a single instruction cycle. The dual -
MAC parallel pipeline begins with two 32-bit registers,
xand y. The pipeline treats the 32-bit registers as four
16-b it signed registers if used as input to two signed
16-bit x 16-bit multipliers. Each multiplier produces a
full 32-bit result stored into registers p0 and p1. The
DAU can direct the output of each multiplie r to a 40-bit
ALU or a 40-bit 3-input ADDER. The ALU and ADDER
results are each stored in one of eight 40-bit accumula-
tors, a0 through a7. Both the ALU and ADDER include
an ACS (add/compare/select) function for Viterbi
decoding. The DAU can direct the output of each accu-
mulator to the ALU/ACS, the ADDER/ AC S, or a 40-bit
BMU (bit manipulation unit) .
The ALU implement s 2-input addition, subtraction, and
various logical operations. The ADD ER implements
2-input or 3-input addition and subtraction. To support
Viterbi decoding, the ALU and ADDER have a split
mode in which two simul taneous 16-bit additions or
subtractions are performed. This mode, available in
speci alized dual-MA C instructions, is used to compute
the distance between a received symbol and its esti-
mate.
The ACS provides the add/compare/ select function
required for Viterbi decoding. This unit provides flags to
the traceb ack encoder for implemen ting mode-con-
trolled side-eff ects for ACS operations. The source
operands for th e ACS are any two accumulators, and
results are written back to one of the source accumula-
tors.
The BMU implements barrel-shift, bit-fi eld insertion, bit-
field extraction, exponent extraction, normalization, and
accum ulator shuffling operat ions. ar0 through ar3 are
auxiliary registers whose main function is to control
BMU operations.
The user can enab le overflow saturation to affe ct the
multipl ier output and the results of the three arithm etic
units. Overflow saturation can also af fect an accumula-
tor value as it is transferred to memory or other
register . These features accommodate various speech
coding standa rd s such as GSM-FR, GSM-HR, and
GSM- EFR. Shifting in the arithmetic pipeline occu rs at
several stages to accommodate various standa rds for
mixed-precision and double-precision multiplications.