AVR
Enhanced RISC Microcontrollers
Alf-Egil Bogen
Vegard Wollan
ATMEL Corporation
ATMEL Development Center, Trondheim, Norway
High level languages (HLLs) are rapidly
becoming the standard programming
methodology for embedded microcontrollers
(MCUs), even for smaller 8-bit devices. The C
language is probably the most widely used HLL
in MCUs, but will in most applications give an
increased code size compared to assembly
programming. ATMEL identified the need of an
architecture developed specially for the C
language in order to reduce this overhead to a
minimum. The result is the ATMEL AVR MCU,
that in addition to the optimized code size, is a
true single cycle RISC (Reduced Instruction
Set Computer) machine with 32 general
purpose registers (accumulators) running 4-12
times faster than currently used MCUs.
1. Introduction
The initial AVR product offering is three 8-bit
base-line devices with enhanced 16-bit
hardware support. Atmel’s low-power non-
volatile memory technology is used for
program code and data. The on-chip program
Flash and data EEPROM are in-system
programmable. The three first AVR MCUs
have 1K, 2K and 8K bytes program Flash
organized as 16-bit wide instruction words.
The Atmel AVR Enhanced RISC
Microcontrollers offer an architecture concept
for high performance and low-power
consumption simultaneously. A full range of
AVR MCUs - from base-line to top end -
feature a RISC architecture and instruction set
optimized for efficient code density with built-in
support for high-level languages.
Please refer to [1] for more details.
2. Enhanced RISC
Many existing RISC architectures require
larger code size to perform a given task with
the traditional CISC (Complex Instruction Set
Computer) architectures. RISC MCU’s are
often chosen where a high speed is needed.
The reduced instruction set will be fast, but
reduced in complexity.
The
AVR
is designed to be a RISC MCU with a
larger number of instructions to reduce the
code size and to increase the speed further.
Ciscy-like instructions are introduced without
letting the RISC performance and low power
consumption features suffer. This first major
enhancement was made after thorough
analysis of several architectures and large
amounts of application code. Still, the regular
AVR
RISC architecture enables cost effective
implementations.
The second enhancement is achieved by
tuning the architecture for optimizing code
generation for the C language. This was done
with a large application oriented benchmark
suite, where the code was pseudo-compiled for
the different enhancement alternatives in the
architecture. Special tuning of the different
addressing modes was important, “need to
have” instead of “nice to have”.
Many MCU architectures have only a small
number of general registers or working
registers (accumulators) - typically 1-8
registers. This is a major drawback for the C
compiler design, where a lot of data moving is
necessary. The
AVR
has 32 general-purpose
working registers, that the C compiler fully
utilizes to achieve the highest code density.
3. True Single Cycle Instructions
With true single cycle instructions the internal
clock is identical to the oscillator clock. There is
no internal divider to produce the different
clock phases!
Most of the micros in the 8 - 16-bit market are
dividing the clock with a ratio of 1:4 to 1:12,
which is a bottleneck for the speed. For a given
task the
AVR
will run 4 to 12 times faster, or
the power consumption can be reduced by a
factor 4-12 with the same clock frequency. In a
CMOS technology, the power consumption of
digital logic is proportional to the frequency.
Figure 1 shows the extreme increase of MIPS
(Million Instructions Per Second) with true
single cycle (1:1 ratio) compared to a clock
division ratio of 1:4 and 1:12.
Figure 1: MIPS/Power Consumption
4. Designed for the C language
The C language is the most used HLL in the
world today for MCUs. Since most MCU
architectures are developed with assembly
programming in mind the support for typical C
instructions are poor. ATMEL’s goal was to
develop an architecture that was efficient both
for the C language and assembly. With many C
experts from C compiler suppliers in the design
team, a very code efficient 8-bit micro with 16-
bit support is developed.
When programming in C, one general rule is to
use variables defined within a routine (local),
instead of using global variables known in the
whole program. Local variables will only
allocate RAM memory when executing the
specific routine while global variables will
occupy RAM all the time. To handle local
variables fast and code efficient, a lot of
general-purpose registers are needed. The
AVR
has 32 general-purpose registers, all of
them in the Arithmetic Logic Unit (ALU) path
allowing true single cycle instructions. Figure 2
shows how efficient many registers can be
compared to traditional CISC architectures with
one accumulator.
Figure 2: Efficient code in the AVR
Three pairs of the 32 registers can be used as
16-bit pointers allowing indirect jumps and calls
as well as many data memory accessing
modes directly related to the C language. In
addition to that the traditional stack for return
address is available through the instruction set.
Most MCU architectures have only 1-2
accumulators and 1-2 pointers.
Since pointers are very frequently used in C,
the operations on the pointers are very
important due to speed and code size. The
AVR
has addressing modes that directly pre-
increments or post-decrements the different
pointers when used to access data memory.
In addition to that, table lookup or stack
operations can effectively be performed by
using displacement (relative to the current
pointer value) as shown in Figure 3. The
displacement range is 0-64 bytes and detailed
analysis shows that the range is sufficient for
Function: A = ((A .and. 84h) + (B.eor.C).or.80h
AVR
code CISC code
EOR B,C MOV ACC,C
ANDI A,#84h EOR ACC,B
ADD A,B MOV TMP,ACC
ORI B,#80h MOV ACC,A
AND ACC,#84h
ADD ACC,TMP
OR ACC,#80h
MOV A,ACC
8 bytes 12-16 bytes
4 clocks 48-96 clocks
most lookups in structures and tables. One
very important note is that this fits into a single
word instruction!
To be able to change the pointers more than
by using the pre-decrement or post-increment
modes addition and subtraction between a 16-
bit pointer and a constant are implemented in
one cycle and a single word. The three
pointers can also be used as eight 8-bit or
three 16-bit general purpose working registers.
In general it is important to use 8-bit numbers
in an 8-bit MCU since all data memories are 8
bit wide. However, when using C, larger
number like integers (16 bit), long (32 bit) and
float (32 bits floating point) are frequently used.
Traditionally, computing on such numbers
generates very large code, but since the
AVR
is designed to handle it, the
AVR
code
generated is extremely small.
Example 1 shows an example where a small
part of a routine written in C is compiled to see
how it is translated to assembly code.
Example 1:
void routine(void)
{
long n1, n2;
int n3;
...
if (n1 != n2) n3 +=5;
...
}
n1 and n2 are both 32-bits numbers that
require 4 bytes each. n3 is an integer that
requires two bytes. All three variables are
defined as local variables within the routine.
This program example will in
AVR
assembly
code be extremely compact. n1 = R3-R0, n2 =
R7-R4 and n3 = R17:R16.:
...
CP R0,R4 ; n1-n2 (byte 0)
CPC R1,R5 ; n1-n2-C (byte 1)
CPC R2,R6 ; n1-n2-C (byte 2)
CPC R3,R7 ; n1-n2-C (byte 3)
BREQ EQUAL ; Branch if equal
SUBI R16,LOW(0xfffb) ;n3+5 low byte
SBCI R17,HIGH(0xfffb);n3+C high byte
EQUAL:
...
In most MCU’s the comparison from exam ple 1
needs many more instructions since they miss
Compare with Carry (CPC) and zero flag
propagation. Zero flag propagation has a
similar function as Carry propagation, to adjust
the higher bytes in a number. The Zero
propagation is implemented on compare and
subtract instructions. A Compare instruction is
generally a subtraction without storing of the
result. The CPC is added to support
comparisons of larger numbers than 8 bit.
Example 1 does also show how a Subtract
Immediate (SUBI) and a Subtract Immediate
with Carry (SBCI) can be used instead of
addition (ADDI and ADCI) to compute n3 + 5
by subtracting the 2’s complement of 5. Since
ADDI and SUBI are complementary
instructions only one pair is implemented to
leave decoding space for other important
instructions.
All the discussed features are implemented as
a result of code benchmark and input from C
compiler experts. The result is a very fast
microcontroller with RISC performance and
CISC code density.
[1]
AVR
Enhanced RISC microcontroller”,
data book, May 1996. Atmel Corporation
Figure 3: SRAM Direct with Displacement