Rev. H | Page 4 of 64 | January 2011
ADSP-BF531/ADSP-BF532/ADSP-BF533
BLACKFIN PROCESSOR CORE
As shown in Figure 2 on Page 5, the Blackfin processor core
contains two 16-bit multipliers, two 40-bit accumulators, two
40-bit ALUs, four video ALUs, and a 40-bit shifter. The compu-
tation units process 8-bit, 16-bit, or 32-bit data from the
register file.
The compute register file contains eight 32-bit registers. When
performing compute operations on 16-bit operand data, the
register file operates as 16 independent 16-bit registers. All
operands for compute operations come from the multiported
register file and instruction constant fields.
Each MAC can perform a 16-bit by 16-bit multiply in each
cycle, accumulating the results into the 40-bit accumulators.
Signed and unsigned formats, rounding, and saturation are
supported.
The ALUs perform a traditional set of arithmetic and logical
operations on 16-bit or 32-bit data. In addition, many special
instructions are included to accelerate various signal processing
tasks. These include bit operations such as field extract and
population count, modulo 2
32
multiply, divide primitives, satu-
ration and rounding, and sign/exponent detection. The set of
video instructions includes byte alignment and packing opera-
tions, 16-bit and 8-bit adds with clipping, 8-bit average
operations, and 8-bit subtract/absolute value/accumulate (SAA)
operations. Also provided are the compare/select and vector
search instructions.
For certain instructions, two 16-bit ALU operations can be per-
formed simultaneously on register pairs (a 16-bit high half and
16-bit low half of a compute register). Quad 16-bit operations
are possible using the second ALU.
The 40-bit shifter can perform shifts and rotates and is used to
support normalization, field extract, and field deposit
instructions.
The program sequencer controls the flow of instruction execu-
tion, including instruction alignment and decoding. For
program flow control, the sequencer supports PC relative and
indirect conditional jumps (with static branch prediction), and
subroutine calls. Hardware is provided to support zero-over-
head looping. The architecture is fully interlocked, meaning that
the programmer need not manage the pipeline when executing
instructions with data dependencies.
The address arithmetic unit provides two addresses for simulta-
neous dual fetches from memory. It contains a multiported
register file consisting of four sets of 32-bit index, modify,
length, and base registers (for circular buffering), and eight
additional 32-bit pointer registers (for C-style indexed stack
manipulation).
Blackfin processors support a modified Harvard architecture in
combination with a hierarchical memory structure. Level 1 (L1)
memories are those that typically operate at the full processor
speed with little or no latency. At the L1 level, the instruction
memory holds instructions only. The two data memories hold
data, and a dedicated scratchpad data memory stores stack and
local variable information.
In addition, multiple L1 memory blocks are provided, offering a
configurable mix of SRAM and cache. The memory manage-
ment unit (MMU) provides memory protection for individual
tasks that may be operating on the core and can protect system
registers from unintended access.
The architecture provides three modes of operation: user mode,
supervisor mode, and emulation mode. User mode has
restricted access to certain system resources, thus providing a
protected software environment, while supervisor mode has
unrestricted access to the system and core resources.
The Blackfin processor instruction set has been optimized so
that 16-bit opcodes represent the most frequently used instruc-
tions, resulting in excellent compiled code density. Complex
DSP instructions are encoded into 32-bit opcodes, representing
fully featured multifunction instructions. Blackfin processors
support a limited multi-issue capability, where a 32-bit instruc-
tion can be issued in parallel with two 16-bit instructions,
allowing the programmer to use many of the core resources in a
single instruction cycle.
The Blackfin processor assembly language uses an algebraic syn-
tax for ease of coding and readability. The architecture has been
optimized for use in conjunction with the C/C++ compiler,
resulting in fast and efficient software implementations.
MEMORY ARCHITECTURE
The ADSP-BF531/ADSP-BF532/ADSP-BF533 processors view
memory as a single unified 4G byte address space, using 32-bit
addresses. All resources, including internal memory, external
memory, and I/O control registers, occupy separate sections of
this common address space. The memory portions of this
address space are arranged in a hierarchical structure to provide
a good cost/performance balance of some very fast, low latency
on-chip memory as cache or SRAM, and larger, lower cost and
performance off-chip memory systems. See Figure 3, Figure 4,
and Figure 5 on Page 6.
The L1 memory system is the primary highest performance
memory available to the Blackfin processor. The off-chip mem-
ory system, accessed through the external bus interface unit
(EBIU), provides expansion with SDRAM, flash memory, and
SRAM, optionally accessing up to 132M bytes of
physical memory.
The memory DMA controller provides high bandwidth data-
movement capability. It can perform block transfers of code or
data between the internal memory and the external
memory spaces.
Internal (On-Chip) Memory
The processors have three blocks of on-chip memory that pro-
vide high bandwidth access to the core.
The first block is the L1 instruction memory, consisting of up to
80K bytes SRAM, of which 16K bytes can be configured as a
four way set-associative cache. This memory is accessed at full
processor speed.