DSPLib For The Hitachi SH1 - Hitachi Semiconductor

DSPLib For The Hitachi SH1

7000 Series Microcontroller

Application Note

19-031/1.0

June 1996

 Hitachi Micro Systems Europe Ltd 1996

When using this document, keep the following in mind,

1, This document may, wholly or partially, be subject to change without notice.

2, All rights are reserved: No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without Hitachi’s permission.

3, Hitachi will not be held responsible for any damage to the user that may result from accidents or any other reasons during the operation of the user’s unit according to this

document.

4, Circuitry and other examples described herein are meant only to indicate the characteristics and performance of Hitachi’s semiconductor products. Hitachi assumes no

responsibility for any intellectual property claims or other problems that may result from applications based on the examples therein.

5, No license is granted by implication or otherwise under any patents or other rights of any third party or Hitachi, Ltd.

6, MEDICAL APPLICATIONS: Hitachi’s products are not authorised for use in MEDICAL APPLICATIONS without the written consent of the appropriate officer of Hitachi’s

sales company. Such use includes, but is not limited to, use in life support systems. Buyers of Hitachi’s products are requested to notify the relevant sales office when planning to

use the products in MEDICAL APPLICATIONS.

DSPLib For SH Application Note(19-031/1.0) Preface

Hitachi Micro Systems Europe Ltd i

Preface

This manual is intended as a user guide for the Digital Signal Processing library

DSPLib. DSPLib has been developed for use with the Hitachi ‘Super H’ range of RISC

microcontrollers, and in particular the SH-1 device.

The organisation of this manual is as follows :

Overview: Gives an overview of the library and its performance.

Function Descriptions: Describes the use of each function, including

background theory and software examples where

necessary.

DSPLib Implementation: Takes the reader through a typical DSPLib

implementation. Includes notes on hardware interfacing

to the SH-1 as well as examples of typical software

routines.

Related Manuals

The reader should refer to the following Hitachi manuals when using DSPLib:

DSPLib For SH User Guide

SH7032/SH7034 Hardware Manual

SH7000 Series Programming Manual

SH Series C Compiler

SH Series Cross Assembler

H Series Linkage Editor

For software development tools, contact your Hitachi sales office.

DSPLib For SH                               Application Note (19-031/1.0)                                           Contents
Hitachi Micro Systems Europe Ltd ii
TABLE OF CONTENTS
PREFACE ................................................................................................................................................i
0 OVERVIEW......................................................................................................................................1
1 GENERAL DESCRIPTION.......................................................................................................................1
1.2 Note About Data Types................................................................................................................2
2 FUNCTION TIMINGS .............................................................................................................................3
2.1 FFT Timings ................................................................................................................................3
2.2 Filter Timings ..............................................................................................................................3
2.3 Convolution & Correlation Timings............................................................................................4
0 FAST FOURIER TRANSFORMS (FFT).......................................................................................5
1 FFT BACKGROUND THEORY ...............................................................................................................5
1.1 Not In Place FFT.........................................................................................................................6
1.2 In-Place FFT ...............................................................................................................................7
1.3 FFT Scaling.................................................................................................................................8
1.4 Input / Output Array Format .......................................................................................................8
2 ROUTINE DESCRIPTIONS ......................................................................................................................9
2.1 FftReal.........................................................................................................................................9
2.2 FftComplex ................................................................................................................................13
2.3 IfftComplex ................................................................................................................................14
2.4 IfftReal.......................................................................................................................................16
2.5 FftInComplex.............................................................................................................................17
2.6 FftInReal....................................................................................................................................17
2.7 IfftInComplex.............................................................................................................................18
2.8 IfftInReal............................................................................................................... .....................18
0 WINDOW FUNCTIONS................................................................................................................21
1 FUNCTION DESCRIPTIONS ..................................................................................................................21
2 USING WINDOW FUNCTIONS..............................................................................................................24
3 FFT APPLICATION OF WINDOW FUNCTIONS......................................................................................24
0 FILTERS.........................................................................................................................................31
1 FILTER BACKGROUND THEORY .........................................................................................................31
1.1 Finite Impulse Response Filters (FIR).......................................................................................32
1.2 Infinite Impulse Response Filters (IIR)......................................................................................33
1.3 Filter Specification ....................................................................................................................35
2 ROUTINE DESCRIPTIONS ....................................................................................................................37
2.1 Fir..............................................................................................................................................37
2.2 Fir1............................................................................................................................................39
2.3 Iir...............................................................................................................................................41
2.4 Iir1..................................................................................................................... ........................43
2.5 DIir ............................................................................................................................................44
2.6 DIir1 ..........................................................................................................................................49
2.7 Lms.............................................................................................................................................50
2.8 Lms1...........................................................................................................................................57
0 CONVOLUTION AND CORRELATION ...................................................................................59
1 BACKGROUND THEORY.....................................................................................................................59
1.1 Convolution ...............................................................................................................................59
1.2 Correlation ................................................................................................................................60
2 ROUTINE DESCRIPTIONS ....................................................................................................................61
2.1 ConvComplete............................................................................................................................61
2.2 ConvCyclic.................................................................................................................................62

DSPLib For SH Application Note (19-031/1.0) Contents

Hitachi Micro Systems Europe Ltd iii

5.2.3 ConvPartial................................................................................................................................64

5.2.4 Correlate....................................................................................................................................65

5.2.5 CorrCyclic .................................................................................................................................67

5.3 CORRELATION EXAMPLE ...................................................................................................................68

6.0 MISCELLANEOUS FUNCTIONS...............................................................................................71

6.1.1 GenGWnoise..............................................................................................................................71

6.1.2 MatrixMult.................................................................................................................................72

6.1.3 VectorMult.................................................................................................................................73

6.1.4 MsPower....................................................................................................................................74

6.1.5 Mean..........................................................................................................................................74

6.1.6 Variance.....................................................................................................................................75

6.1.7 MaxI...........................................................................................................................................76

6.1.8 MinI ...........................................................................................................................................76

6.1.9 Peakl..........................................................................................................................................77

7.0 DSPLIB IMPLEMENTATION.....................................................................................................79

7.1 HARDWARE .......................................................................................................................................79

7.1.1 Analogue to Digital Conversion................................................................................................80

7.1.2 Digital To Analogue Conversion...............................................................................................81

7.1.3 Circuit Description....................................................................................................................82

7.2 SOFTWARE.........................................................................................................................................84

7.2.1 Main()........................................................................................................................................84

7.2.2 system()......................................................................................................................................86

7.2.3 adc(short*).................................................................................................................................87

7.2.4 detect(short*).............................................................................................................................88

7.2.5 dac(short*).................................................................................................................................89

7.3 Timing...........................................................................................................................................90

APPENDIX ONE..................................................................................................................................92

ENSIGMA SOFTWARE GUIDE FOR DSPLIB...............................................................................................92

DSPLib For SH Application Note (19-031/1.0) Overview

Hitachi Micro Systems Europe Ltd 1

1.0 Overview

1.1 General Description

DSPLib represents a number of standard ‘off the shelf’ fixed point DSP operations

which can be used either stand alone or in series to form an optimised DSP process

which will run on a microcontroller. The functions offered by the library come in five

distinct groups.

Fast Fourier Transforms

Window Functions

Filters

Convolution & Correlation

Miscellaneous

The library offers each function in both ANSI C source code and also optimised SH

assembler. The C code is provided only for debugging and reference. It is not

recommended that the C code be used in a finished system due to the inefficiencies

introduced by the C compiler.

As well as the DSP functions, DSPLib offers a number of example programs to

demonstrate to the user how easily DSPLib can be integrated into standard ANSI C

code.

In order to reduce development times, this document has been written to take the reader

through each function indicating typical pitfalls, as well as hints to improve execution

speed. The document forms a type of user guide for the main part, and towards the end

indicates some hardware design considerations as well when designing with the Hitachi

SH series.

All the functions offered by DSPLib do not use any local static variables, thus enabling

the functions to be called from interrupt routines without affecting the operation of the

main program.

In order to use DSPLib with the Hitachi SH series of microcontrollers, the user will

require the Hitachi tool chain to build programs. This tool chain consists of the

following :-

SH Series Cross Assembler

SH Series C Compiler

H Series Linkage Editor

DSPLib For SH Application Note (19-031/1.0) Overview

Hitachi Micro Systems Europe Ltd 2

Although a set of GNU tools are available for the Hitachi SH series, the optimised SH

assembler in DSPLib is not compatible, and is therefore unsuitable. The ANSI C code

can be compiled by the GNU tool, but will result in a very inefficient implementation.

DSPLib is supplied on a 3.5” disk in MS-DOS format. The disk consists of a file

ensigma.lib which contains the entire library ready built in object form. Also on the

disk is a header file called ensigdsp.h, as well as the ANSI C and optimised assembler

functions. Some example code is also supplied on the disk, details of which may be

found in the related Ensigma documentation. A number of test routines are included to

allow the user to verify correct operation.

The directory structure of DSPLib is as follows :-

When installing DSPLib, create a directory for the above tree to sit in. The Hitachi tools

should be installed in a separate directory, and the path then set such that the include and

lib directories of DSPLib are visible when building programs. To copy the library onto

a hard drive called c:\, place DSPLib into drive A:\ and type the following :-

XCOPY A:\* C:\shc\shc2.0 /s

Further details on installation and library building are included in the related Ensigma

documentation.

1.1.2 Note About Data Types

DSPLib makes use of a number of ANSI C data types. Representation of these types on

an SH-1 device are as follows :-

Byte 8 bits

Word 16 bits

Long 32 bits

Float 32 bits

Double 64 bits

DSPLib For SH Application Note (19-031/1.0) Overview

Hitachi Micro Systems Europe Ltd 3

1.2 Function Timings

The execution times are based on a 20Mhz SH1 device, and should be adjusted

accordingly for other clock speeds.

1.2.1 FFT Timings

The timings are given below in milliseconds for the execution speed of the FFT

functions.

Not In Place

Forward FFT Inverse FFT

Size Complex Real Complex Real

128 1.94 1.06 2.19 1.20

256 4.43 2.37 4.94 2.66

512 9.98 5.29 11.00 5.88

1024 22.23 11.70 24.28 12.88

In Place

Forward FFT Inverse FFT

Size Complex Real Complex Real

128 1.94 1.06 2.20 1.19

256 4.46 2.37 4.97 2.64

512 10.03 5.32 11.06 5.86

1024 22.39 11.75 24.44 12.83

1.2.2 Filter Timings

The timings are given below in microseconds for the execution speed of the filter

functions. The multiple sample routines are quoted in a per sample format, computed

by dividing the computational time for 100 samples by 100.

Function

No. Of Coeffs FIR LMS FIR1 LMS1

5 8.6 24.9 17.2 34.7

10 10.3 40.6 18.9 50.4

100 46.1 328.4 54.7 338.2

Function

No. Of

Biquads IIR DIIR IIR1 DIIR1

1 10.5 40.1 18.1 48.0

2 19.7 77.6 28.8 86.4

20 185.5 752.8 221.4 784.3

DSPLib For SH Application Note (19-031/1.0) Overview

Hitachi Micro Systems Europe Ltd 4

1.2.3 Convolution & Correlation Timings

The timings are given below in microseconds for the execution speed of the various

convolution and correlation functions offered by DSPLib. The timings are quoted per

output sample, thus found by dividing the time to produce n output samples by n.

The table shows the computational time per output sample for a particular input array

size iw. It should be noted that iw is always the smaller of the two input arrays.

Routine

Size Of

iw ConvPartial ConvCyclic CorrCyclic Correlate

5 9.7 9.7 9.0 8.1

10 10.6 11.2 10.2 9.0

100 54.6 55.4 47.7 46.4

NOTE :- Special Case

In certain situations, computation takes place outside the input array boundaries. When

this happens, the results formed from elements outside the boundary are not calculated.

The effect of this is that the computational time decreases. An example of such a

situation is when the two input arrays are the same size. When this happens, there is

only one instance when the solution is formed entirely from elements inside the

boundary. For all other instances, the number of computations per sample reduce.

Computation outside the array boundaries can take place in the ConvComplete and

Correlate functions only. Taking the extreme case, when both the input arrays are the

same size, the timings per sample are given below. Once again, the timings were found

by taking the time required to produce n output samples, and dividing by n.

All timings quoted below are given in microseconds.

Routine

Size Of

iw No. Of Output

Samples ConvComplete Correlate

5 9 6.5 8.4

10 19 7.1 8.2

100 199 29.0 26.1

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 5

2.0 Fast Fourier Transforms (FFT)

Introduction

The Fourier Transform provides a method of performing frequency analysis on a given

time domain signal. The FFT of a continuous time domain signal returns the

corresponding frequency domain equivalent. The use of a ready designed software

routine releases the applications engineer from the burden of complicated mathematics

involved with spectrum analysis. As a result of this, the designer is able to concentrate

directly on the development of the specific application. Real world applications of the

Fourier Transform are common place, and i nclude such examples as noi se analysis (e.g.

analysing E.M.C. compliance), spectrum estimation as well as numerous

telecommunication processes.

DSPLib provides routines which calculate the classical forward and inverse FFTs in

Radix 2 form. Each routine is described in detail below, with examples of operation

given where necessary.

2.1 FFT Background Theory

The purpose of this section is to reiterate some basic FFT theory such that the reader

will gain an appreciation of the function of each of the software routines described in

this chapter.

DSPLib provides basic building blocks that allow the calculation of Radix two

decimation in time transforms. Radix two implies that the time domain input data is

required in blocks that are a power of two. This means that an FFT cannot be performed

on a sample by sample process, but instead must be presented in blocks of samples. The

choice of size of the block of time domain samples presented to the FFT is determined

by a number of factors such as:-

Frequency Content Of Signal

Required Resolution In Frequency Domain

Processor Time Available To Perform FFT

Must Be Power Of Two (Radix Two)

It is necessary for the designer to examine each of the above considerations and reach a

favourable compromise in order to implement an efficient FFT for a specific

application.

Although in-depth mathematical knowledge of the FFT is not required to use theDSPLib

FFT functions, an appreciation of the structure is desirable, thus enabling the designer to

understand the required formats for data input and output, as well as understanding the

results generated by the FFT.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 6

The forward Fourier transform is defined by the following equation:-

yn e x

sjnN

() .

=−−

∑

Conversely the inverse transform may be written as :-

yn e x

sjnN

() .

=−

∑

The exponential powers shown are known as the twiddle factors and are used to

combine two data points within the FFT. To improve on computational time the

twiddle factors are computed first and stored in a look-up table. It is desirable to ensure

that the look-up table is stored in internal RAM in order to minimise memory access

times and hence improve FFT performance. In the case of DSPLib the look-up tables

are generated by executing the InitFft routine first.

DSPLib offers two distinct structures of FFT. Both will return bit identical results, and

both have near identical execution speeds, however the use of memory differs for the

storage of results.

2.1.1 Not In Place FFT

This version of the FFT takes the block of time domain samples from RAM, performs

the FFT and returns the result to a new location in RAM. The structure of a not-in-place

FFT can be seen in diagram 2.1 below.

W0 W0 W0

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

X(0)

X(1)

X(2)

X(3)

X(4)

X(5)

X(6)

X(7)

T ime D o m a in S amp les F requ en cy D o m a in Samp le s

Diagram 2.1 : Not In-Place FFT Flowgraph

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 7

It should be noted from Diagram 2.1 that the inputs to the FFT are not in order. In order

for the FFT to be performed, the data order must be re-arranged by way of bit reversal.

Since this operation is performed by the SH, the bit reversal technique is implemented

in software (unlike some DSP cores which have bit reversed addressing to enhance

execution speed) by the library code, and is of no concern to the user. The output of the

FFT is in order, and so no further modifying of the addresses of the samples is

necessary.

The weightings applied at each of the butterflies on the Flowgraph are the twiddle

factors which were described earlier. The twiddle factors are stored in a look-up table in

order to enhance speed.

2.1.2 In-Place FFT

This version takes the block of time domain samples in memory, performs the FFT

operation and deposits the results into the same memory locations in RAM. This

method has the advantage of using less memory space (50% less), at the expense of

execution time. The hardware arrangements for in-place FFTs require careful planning,

since for each sample that is processed there are two memory accesses, a read and a

write to the same location. To maximise speed, on chip RAM should be used.

Stage 1 Stage 2 Stage 3 X(0)

X(4)

X(2)

X(6)

X(1)

X(5)

X(3)

X(7)

x(0)

x(1)

x(2)

x(3)

x(4)

x(5)

x(6)

x(7)

Time Domain

Samples Frequency Domain

Samples

Diagram 2.2 : In-Place FFT Flowgraph

As can be seen in the above flow diagram, bit reversal is still necessary with in-place

FFTs. The above diagram shows bit reversal is necessary on the output data. In reality,

the difference in execution time between the two functions is negligible. Therefore, the

only real consideration when deciding on the implementation is the amount of on chip

RAM that is likely to be needed. For maximum conservation of memory space, use the

in-place implementation.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 8

As with any process where time is important, it is recommended that the whole of

DSPLib is executed from on chip RAM.

2.1.3 FFT Scaling

When implementing an FFT, it is important to have an understanding of the signal being

presented to it such that overflow does not occur at any stage. There are a number of

different methods of preventing overflow, all based around the scaling of the input data.

When deciding on the scaling applied, it is important to consider the quantisation

introduced against the likelihood of overflow. The related Ensigma text describes in

detail the method of scaling employed by DSPLib, and should be carefully considered

before implementing a transform. For most applications the scaling EFFTALLSCALE

(0xFFFFFFFF) will be suitable. This level of scaling simply performs halfing on every

stage, with a guarentee that no overflow will occur.

To summarise FFT scaling, here are the three levels offered by DSPLib.

EFFTALLSCALE :- Halving performed on every stage, guaranteeing no overflow if the

input samples are all less than 230 .

EFFTMIDSCALE :- Halving performed on alternate stages, hence giving an output with

the same power as the input.

EFFTNOSCALE :- No scaling performed. Signal properties must be well understood

when using this.

2.1.4 Input / Output Array Format

DSPLib offers both real and complex FFT computation, and so the format and ordering

of data arrays presented to the transform must be understood. Firstly, we will look at

purely real arrays, as used in all the real forward and inverse transforms. The order of

the array is as follows:-

x[2n] = Re{C(n)} i.e. all elements are real on input

x[2n+1] = 0 all imaginary components are zero (real signal)

For a real transform, the input array contains real time domain samples and so contains

no phase information. The output from the transform is the frequency domain

representation of the signal. The output of the FFT is in complex array format (see

below), however, in the case of a real signal all the complex samples are zero. Each real

sample represents the signals frequency content at a certain point. The highest sample,

i.e. the last sample represents Fs/2. I.e. the FFT only returns frequency data for half the

sampling frequency.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 9

The complex transforms require data to be presented in complex array format, and

similarly the results are returned in the same format. The ordering of a complex array is

as follows:-

x[2n] = Re{c(n)} Real elements

x[2n+1] = Im{c(n)} Imaginary elements

I.e. the input and output elements are ordered :- {real, imaginary, real, imaginary ......}

2.2 Routine Descriptions

2.2.1 FftReal

This function performs a purely real forward fast fourier transform. For a full

description of the definition of the function, the Ensigma documentation should be

consulted.

The routine is defined as follows :-

int FftReal( output, input, size, scale)

Where,

short output[] positive frequency output data

short input[] real input samples

long size size of FFT

long scale scaling applied to FFT

Using FftReal

The use of FftReal, as with any other function in DSPLib is relatively straight forward.

The user must define the scaling and size of the function, as well as present input data in

a suitable format.

This example, and others in this section use the LogMagnitude function from DSPLib to

present the output of the FFT in log magnitude format. Although the use of this

function is included in the text, further details of this function are included in the

Ensigma documentation.

Example

The following example outlines a simple application of the FftReal function, intended as

a starting point for the reader when developing more specific applications. In this

example, the signal applied to the FFT contains two frequency components with

different amplitudes. The FFT is used to determine the frequency and amplitude of the

components of the signal.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 10

Data presented to the FFT must be 16 bits or less(including the sign bit), and in blocks

that are a power of two (radix two operation).

In our case, the input consists of 128 samples. The output will consist of 64 samples

interlaced with zeros, confirming that the input is real, since all complex samples are

null.

It is necessary for the user to determine the size of the FFT, and also the scaling applied

during calculation. Since there are 128 input samples a 128 point FFT is required for a

single block computation. Referring to the FFT structure in diagram 2.1, the required

size of FFT becomes apparent.

In the diagram, an eight point FFT is shown, and is capable of processing data in blocks

of eight. Therefore, in our case, the 128 input samples can be processed in one block

using a 128 point FFT.

The C code implementation is shown below, and shows how DSPLib is combined in a

standard ANSI C program. The array INPUT has been created from an earlier source,

either an ADC or generated by the SH in a routine. The samples in INPUT are

transformed and written to the array OUTPUT.

#include <stdio.h>

#include <math.h> /* Typical Libraries To Be Included */

#include <ensigdsp.h>

#define NSAM 128 /* Number Of Input Samples*/

#define NSAMS NSAM/2 /* Number Of Output Samples From LogMagnitude */

void main(void)

{short OUTPUT[NSAM], LOGOUT[NSAMS];

float fscale;

fscale = 1;

/* Enter Code Here To Generate INPUT Data */

/* Generate Twiddle Factor Look-Up Tables */

if(InitFft (NSAM) != EDSP_OK)

printf(“Problem With Generating Look-Up Table”);

/* Perform FFT Operation */

if(FftReal (OUTPUT, INPUT, NSAM, EFFTALLSCALE) != EDSP_OK)

printf(“Problem With Performing FFT Operation”);

/* Log Magnitude Results Returned By FFT */

if(LogMagnitude( LOGOUT, OUTPUT, NSAM, fscale) != EDSP_OK)

printf(“Problem With LogMagnitude Routine”);

}

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 11

The above example uses a total of three functions from DSPLib, and is centred around

the FftReal function described above.

InitFft calculates the twiddle factors and stores them in a look-up table which is used by

FftReal during its execution. More details of InitFft can be found in the Ensigma

documentation.

LogMagnitude takes the complex output data from FftReal, and returns a purely real

logarithmic result. The scale in LogMagnitude may be set to suit the application, and in

this case is set to one. LogMagnitude can be found in the Ensigma text. Diagram 2.3

below shows a plot of the 128 samples contained in the array INPUT. The FFT

processed these samples and returned the results to OUTPUT. Following the

completion of FftReal, LogMagnitude calculated the log format of the results, and this

can be seen in diagram 2.4 below.

-1500

-1000

-500

500

1000

1500

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127

Diagram 2.3 :- 128 Time Domain Input Samples

-10

1 4 7 10131619222528313437404346495255586164

Diagram 2.4 :- Result Of FFT (64 Frequency Domain Samples)

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 12

The result of the FFT is shown in diagram 2.4 above. There are only two samples which

are not zero, and these represent the two frequency components of the signal shown in

diagram 2.3.

It should be noted that the result of LogMagnitude is half the length (64 samples) of the

original FFT result (128 samples). This is due to the fact that the real and imaginary

samples are squared, added together and then logged.

Analysis of the above plot in diagram 2.4 will give us detailed information on the

spectrum of the signal in diagram 2.3. The plot is analysed as follows :-

The two peaks appear at samples 4 and 16. Since sample 64 is known to correspond to

Fs/2 (half the sampling frequency), other samples may be calculated as follows to form a

table :-

Frequency Sample

Fs/2 64/1 64

Fs/4 64/2 32

Fs/8 64/4 16

Fs/16 64/16 4

Hence, from the above table we know that our signal contains two components, one at

Fs/4 and the other at Fs/16, which correspond to samples 16 and 4. Next it is necessary

to determine the amplitude of those components. Fs/16 and Fs/8 have amplitudes of

20dB and 40dB respectively. Since the log of a signal is 20 logVpk, where Vpk is the

peak amplitude of the signal, the original signal amplitudes may be found as follows:-

Amplitude of Fs/16 = anti-log (20/20) = 10

Also

Amplitude of Fs/8 = anti-log (60/20) = 1000

So the result of our FFT process is the complete frequency content of the original signal

up to and including Fs/2. Clearly higher in the spectrum is of no concern to us due to

the aliasing effects of digitisation.

The method for determining the amplitude of the components shown above is only

applicable if EFFTALLSCALE is used, and if the scale in LogMagnetude is set to one.

For most applications, this arrangement will be satisfactory and provide the simplest

implementation.

This example has omitted the use of windows when taking FFTs. Windows are used to

compensate for the ‘end effect’ encountered when taking the transform of a finite data

set. The sudden end of the data generates many harmonics which strictly speaking are

not present in the signal. The result of this is that the FFT represents frequency

components which are of no interest to the user.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 13

In order to counter effect this, the data is applied through a window, which has the effect

of ‘fading’ out the start and finish, thus reducing harmonics caused by straight edges.

There are a number of different window shapes available, and the omittance of a

window is referred to as rectangular windowing, and applies to the above example.

DSPLib provides four windows, being Blackman, Hamming, Hanning and Triangle.

These functions and their use are described in section 1.3.2 .

2.2.2 FftComplex

This routine allows the user to present both real and complex data to the FFT for

processing. The result of the FFT is in the same format as that in FftReal, only the

complex values will be non-zero(assuming the complex input data is non-zero). The

use of the routine is very similar to FftReal, and follows on directly from its description.

It is recommended that the reader looks at the FftReal text in the previous section before

continuing here.

The complex FFT is essentially two separate FFTs, one for the complex data and one for

the real. These FFTs are completely separate, and do not interact in anyway.

FftComplex is not-in-place, and hence returns its solution to a different memory

location to that of the input data. Diagram 2.1 shows the structure of a not-in-place

FFT.

The related Ensigma documentation gives detailed information regarding the definition

of FftComplex, however there now follows a brief summary of the function for your

reference.

The routine is defined as follows :-

int FftComplex( output, input, size, scale)

Where,

short output[] Complex Output Data

short input[] Complex Input Data

long size Size Of FFT

long scale Scaling Applied To FFT

Use Of FftComplex

FftComplex is essentially the same to use as FftReal, and so the previous example is

applicable here. The difference between the two functions is that FftComplex allows

complex data to be processed. Therefore, the input and output arrays are both in

complex array format, and so the user should ensure familiarity with this.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 14

When specifying the size of the FFT, it is important to remember that half the inputs are

complex. Therefore, if the user has 128 real samples and 128 complex samples, and it is

necessary to process the samples in one block, a 256 point FFT will be required.

A block diagram representation of the above instance is shown in diagram 2.5. The

diagram shows how two N point FFTs are required to process the real and complex data

samples. Typically the complex samples are representations of the signals phase

characteristics, but as explained below, this doesn’t have to be the case.

N Point FFT

Real Input Samples

x0,x2,x4,x6,....xn

Imaginary Input Samples

x1,x3,x5,x7,....xn+1

Real Output Samples

y0,y2,y4,y6,....yn

Imaginary Output Samples

y1,y3,y5,y7,....yn+1

Complex FFT

Diagram 2.5 :- Representation Of FftComplex Showing Two Separate FFTs

As the above diagram shows, the FftComplex routine is actually two separate interlaced

FFTs. This fact can be exploited by way of performing two real FFTs simultaneously.

Real data can be applied to both the real and complex inputs, with the corresponding

results available in the odd and even samples at the output. In certain applications, this

method can offer time savings over performing two separate FftReal routines.

2.2.3 IfftComplex

This function offers the exact opposite to FftComplex. IfftComplex calculates the

inverse FFT and returns a not-in-place solution. The user presents the routine with a set

of frequency domain samples in complex array format. The highest sample in the array

must correspond to the frequency Fs/2s’ imaginary component. If the signal is real, all

the imaginary values (odd addresses) will be zero. The transform will return the

equivalent time domain signal for the given frequency data. The time domain data will

be in complex array format, and will require separating into real and imaginary if

necessary. When applying data to IfftComplex, it is important that the samples are not

in logmagnitude form, since this will generate an invalid return on the output.

As with all the FFT routines in DSPLib, the user must specify scale and size. Full

details of the routines description are in the related Ensigma documentation.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 15

The routine definition is summarised below for your reference:-

int IfftComplex( output, input, size, scale)

Where,

short output[] complex output data

short input[] complex input data

long size size of inverse FFT

long scale scaling applied during IFFT execution

Use Of IfftComplex

The routine is very simple to use, and as described above is the exact opposite to the

FftComplex routine described. One method of establishing correct operation is to

perform both the forward and inverse transforms on a data set, then comparing the

results to the original data input. If the results are different, there is a problem with one

or both of the functions, such as overflow, excessive quantisation or simply scaling. An

example of such a program is given below.

#include <stdio.h>

#include <math.h> /* Typical Libraries To Be Included */

#include <ensigdsp.h>

#define NSAM 128 /* Number Of Input Samples*/

#define NSAMS NSAM/2 /* Number Of Output Samples From LogMagnitude */

void main(void)

{short OUTPUT[NSAM], OUTPUTA[NSAM], LOGOUT[NSAMS];

float fscale;

int I;

fscale = 1;

/* Enter Code Here To Generate INPUT Data */

if(InitFft (NSAM) != EDSP_OK)

printf(“Problem With Generating Look-Up Table”);

if(FftComplex (OUTPUT, INPUT, NSAM, EFFTALLSCALE) != EDSP_OK)

printf(“Problem With Performing FFT Operation”);

if(IfftComplex (INPUTA, OUTPUT, NSAM, EFFTALLSCALE) != EDSP_OK)

printf(“Problem With Performing IFFT Operation”);

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 16

for(i=0; I < NSAM; I++)

{if((INPUT[i] - INPUTA[i]) != 0)

printf(“Problem With Format Of Results”);

}

The above C code shows how both FftComplex and IfftComplex are used as functions

in a program. The code performs a complex forward not-in-place FFT on the data

stored in INPUT. The resultant frequency domain data then has an inverse FFT applied

and the result stored in INPUTA. If both routines are operating correctly, the values

stored in INPUTA should be identical to INPUT. This is tested by subtracting the two

and comparing to zero. The result is compared to zero, and if different, an error

message is displayed to indicate to the user that the functions are not operating correctly.

The reader should note the use of the same scaling applied to both forward and inverse

transforms. As discussed above, a number of factors can lead to the data being

different, and most can be ‘ironed’ out by the user during development.

2.2.4 IfftReal

IfftReal presents a function which calculates the inverse not-in-place FFT. This

function follows on from the above description of IfftComplex, with the exception of

the format of the returned data. The user presents the routine with complex frequency

domain samples, with the highest sample corresponding to Fs/2. The inverse transform

then returns a purley real time domain representation of the input data. Further details

regarding the definition of the function may be found in the related Ensigma

documentation. For your reference the definition of the function is as follows:-

int IfftReal( output, input, size, scale)

Where,

short output[] real output data

short input[] complex input data

long size inverse FFT size

long scale scaling applied

Using IfftReal

When using IfftReal, the input must be in complex array format. However, the signal

returned from the function must be real, and therefore the input must represent a real

signal. Therefore, the complex input array must represent a positive frequency, and so

all complex array elements must be zero. Before calling this routine the twiddle factors

should be generated by calling InitFft.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 17

2.2.5 FftInComplex

This routine performs the same function as FftComplex which was described earlier in

this section. The user should choose this routine when memory space is of the essence.

FftInComplex calculates the complex forward FFT in-place. The flowgraph for an in-

place FFT can be seen in diagram 2.2. The function is described in detail in the related

Ensigma documentation. For your reference, the definition of the function is as

follows:-

int FftInComplex( data, size, scale)

Where,

short data[] complex input and output data

long size FFT size

long scale FFT scaling applied

Using FftInComplex

As described this function is identical in operation to FftComplex except for the use of

memory. The function takes the input data from memory (stack) performs the

calculations, then returns the results to the SAME memory location. This enables the

user to maximise the use of memory space, but at the expense of overwriting the input

data.

2.2.6 FftInReal

Not surprisingly this follows the same format as FftReal, except that the FFT is

performed in-place. The user should refer to the section on FftReal for details of

operation when using this routine, and also reference to the related Ensigma

documentation would be useful. For your reference the definition of FftReal is as

follows:-

int FftReal( data, size, scale)

Where,

short data[] real data input, complex data output

long size FFT size

long FFT scaling applied

Using FftInReal

The input to the FFT is real data only, where as the output is in complex array format

with the complex array elements equal to zero. To allow the FFT to be calculated in-

place, the Fs/2 frequency output is stored in the second element of the array.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 18

This slightly complicates the output, and should be remembered when using. For more

details consult the Ensigma documentation.

2.2.7 IfftInComplex

This routine represents an in-place implementation of the complex inverse FFT. Its

operation is essentially the same as IfftComplex, only with an in-place computation.

As with all of the in-place routines, IfftInComplex should be chosen when memory

usage is critical. The routine is described in the related Ensigma documentation on.

However, for your reference, here is the definition of IfftInComplex:-

int IfftInComplex( data, size, scale)

Where,

short data[] complex input/output data

long size FFT size

long scale FFT scaling applied

Using IfftInComplex

Operation is identical to the IfftComplex routine described only the results of the inverse

FFT are written to the same memory location as the input. Hence all input data is

overwritten, but memory usage is optimised. As with any of the DSPLib routines, it is

desirable to locate data in on chip memory.

2.2.8 IfftInReal

This final routi ne in the FFT sect ion implem ents a real inverse in-place FFT. Operation

is identical to IfftReal described earlier in this section, with the exception of the in-place

computation returning the results to the same memory location as the inputs. Full

details of IfftInReal can be found in the related Ensigma documentation. For your

reference the function is defined as follows:-

int IfftInReal( data, size, scale)

Where,

short data[] Complex positive frequency data input, real data output

long size Inverse FFT size

long scale FFT scale applied

Using IfftInReal

As described, this method offers in-place inverse real FFT calculation. For details on

the use of this function refer to the section detailing IfftReal. Real output data is in-

place and so overwrites the original input data. Use of this function should be limited to

applications where memory usage must be optimised.

DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms

Hitachi Micro Systems Europe Ltd 19

When applying complex data to IfftInReal, the Fs/2 value should be placed in the

second element of the array where the imaginary component of zero would normally be

stored. Although this might complicate matters some what, it enables the function to

operate in-place.

Summary

This section is intended to give the reader an insight into the operation of each of the

FFT functions. While keeping the descriptions as general as possible to suit most

applications, the descriptions should enable the user to choose between functions when

designing a particular application, as well as pointing out certain design considerations.

The detail and theory in this document is not intended to form a reference for FFT

design, but should instead provide a starting point from where further DSP research may

begin.

NOTE

It is important to remember to initiate the twiddle factor tables before using any of the

FFT routines. These tables are initialised by using the function InitFft in the DSPLib.

Details regarding both this function and the logmagnitude function can be found in the

related Ensigma documentation.

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 21

3.0 Window Functions

Introduction

This section of DSPLib provides four window generators which may be used for a

number of applications, but essentially are used for counter-effecting the ‘end-effect’

experienced when taking the FFT of a finite data length.

Each of the four routines provide efficient methods of calculating the window, and

deposit the results into a user specified memory location. The particular window is then

recalled when required and applied accordingly to the desired data. The four windows

are described below, along with plots of their associated shapes.

Since a window is a continuous signal between limits, it is necessary to divide the

window into equally spaced samples. The number of samples used to represent the

window is specified by the user when calling the function. Clearly the more samples

used, the greater the accuracy of representation.

3.1 Function Descriptions

The window generators in DSPLib are defined as follows :-

int GenBlackman( output, win_size)

int GenHamming(output, win_size)

int GenHanning(output, win_size)

int GenTriangle(output, win_size)

Where,

short output[] array to containing window coefficients

long win_size size of window N and length of output[]

Each function will return EDSP_BAD_ARG if the user specifies a window size that is

smaller than one. If the function executes correctly, the flag EDSP_OK is returned.

These flags are intended for debug use, and will help the user identify problems in code

development.

The functions generate a window to a size specified by the user. The shape of the

generated window can be seen in respective diagrams below. Each diagram shows the

window when size was selected as 128. Therefore, 128 samples (or window

coefficients) are used to represent the windows.

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 22

5000

10000

15000

20000

25000

30000

35000

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121

Diagram 3.1 :- Blackman Window

The Blackman window for 128 samples is shown above in Diagram 3.1. It can be seen

that the side lobes actually extend right down to zero, as well as having a wide centre

lobe. Mathematically, the Blackman window is defined as follows:-

wn n

( ) ( ) .42 . cos . cos

=− −



































≤

210 05 2008 4

ππ

0 n < N

5000

10000

15000

20000

25000

30000

35000

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121

Diagram 3.2 :- Hamming Window

The Hamming window for 128 samples is shown above in Diagram 3.2. It can be seen

that the side lobes do not extend right down to zero, but it does have a very wide centre

lobe. Mathematically, the Hamming window is defined as follows:-

wn n

( ) ( ) . .46cos

=− −























≤

210540 2

0 n < N

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 23

-5000

5000

10000

15000

20000

25000

30000

35000

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121

Diagram 3.3 :- Hanning Window

The Hanning window for 128 samples is shown above in Diagram 3.3. It can be seen

that the side lobes extend right down to zero, and it has a very wide centre lobe.

Mathematically, the Hanning window is defined as follows:-

wn n

() ()

cos

=−−





















≤

212

0 n < N

5000

10000

15000

20000

25000

30000

35000

1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121

Diagram 3.4 :- Triangular Window

The Triangular window for 128 samples is shown above in Diagram 3.4. It can be seen

that the side lobes extend right down to zero, and it has a very narrow centre lobe.

Mathematically, the Triangular window is defined as follows:-

wn nN

() ( )

=−−

−+









≤

211

15 0 n < N

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 24

3.2 Using Window Functions

The use of window functions in DSPLib is very simple, and is demonstrated in the

section of C code given below. It should be remembered that the execution of each of

the window functions is relatively slow, and increases with the size of the window. In

applications where time is an issue, such as real time processing, the window should be

pre-calculated and stored in a look-up table. It is preferable to locate this table in on

chip RAM inorder to minimise access times. The example below shows the Blackman

window being generated. The values generated are stored on the stack in an array

defined by the user. Because, ideally the stack is l ocated in on chip RAM, access to t he

generated values should be fast. The location of the stack is defined by the user in the

linkage editor.

#include <stdio.h>

#include <math.h> /* Typical libraries */

#include <ensigdsp.h> /* MUST be included */

#define WINLEN 128 /* Define length of window */

void main(void)

{short output[WINLEN]; /* Define array to contain window */

if(GenBlackman( output, WINLEN) != EDSP_OK) /* Generate window */

printf(“ Problem Generating Blackman Window”); /* Report if error has

occurred */

}

When the above code is executed, the window values are stored in an array called

output[] of length 128. Although there is not an upper limit to the size of window

generated, care should be taken to ensure that the stack is sufficintly large to

accomodate all the generated values, as well as other val ues being used in the programs

execution.

The execution speed of the routines is of less importance than other routines described

in DSPLib. The window function is typically used to generate values which are stored

in a look up table for use in a real time loop. Therefore an optimised version is not

available in DSPLib. The running time of the routine can be improved by the removal

of the argument checking section of the code. This is only recommended once the

specified arguments are known to be correct and generate the desired output.

3.3 FFT Application Of Window Functions

Before reading this section, the reader should ensure familiarity with the FFT functions

offered by DSPLib. Section two of this document will provide the necessary

background material relating to this example.

As described earlier, window functions have a number of different uses ranging from

filter design to FFT pre-processing. It is the latter that is discussed here.

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 25

The FFT of a finite data set generates harmonics which are not related to the actual

signal, and so leads to false solutions. This effect is caused by the sudden end of the

input data, resulting in straight edges at either end. As familiarity with Fourier series

representation would confirm, a great number of harmonics (infinite for exactly vertical)

are required to represent a straight edge. It is these harmonics whi ch are quite correctly

represented by the FFT.

-10

1 4 7 10131619222528313437404346495255586164

Diagram 3.5 :- Desired FFT Result

Diagram 3.5 shows the desired result from the application of an FFT to a particular

signal. The signal in question has two frequency components, one at Fs/4 and the other

at Fs/32. These two frequencies correspond to samples 32 and 4 respectively. With the

introduction of a finite data set, with a sudden end to the samples corresponding to a

straight edge, the resulting FFT is as shown in Diagram 3.6 below. Clearly a great

number of harmonics have been generated which spread right across the spectrum of

interest. The two frequency components are visible, but are completely encompassed by

harmonics.

1 4 7 10131619222528313437404346495255586164

Diagram 3.6 :- Result Of FFT On Finite Data Length

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 26

In order to counter effect the harmonics, the finite time domain signal is multiplied by a

window of the same length as the data set. The result of this operation is shown below.

The effects of different windows are shown, with varying degrees of success. The C

code implementation of this example is given at the end of this section.

-10

1 4 7 10131619222528313437404346495255586164

Diagram 3.7 :- Result Of Blackman Windowing

-10

1 4 7 10131619222528313437404346495255586164

Diagram 3.8 :- Result Of Hamming Windowing

(Continued Over)

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 27

-10

1 4 7 10131619222528313437404346495255586164

Diagram 3.9 :- Result Of Hanning Windowing

-10

1 4 7 10131619222528313437404346495255586164

Diagram 3.10 :- Result Of Triangular Windowing

As can be seen in Diagram 3.9, the Hanning window generates the best result. The

window has removed most of the harmonics, leaving a clear frequency representation of

the finite data. In most applications, this type of ‘trial and error’ approach will suffice.

The C code implementation is as follows :-

/**

/** DESCRIPTION: This file Generates a sine wave, applies a Hamming window

/** applies an FFT and then returns the solution in log format

#include <stdio.h>

#include <math.h>

#include <ensigdsp.h>

#include "testhead.h"

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 28

#define NSAM 128 /* number of samples */

#define NSAMS NSAM/2

#define TWOPI 6.283185307

/***************** Exercise To Take The FFT Of Data *******************/

void main(void)

{

/** Declare Local Variables **/

short sinea1[NSAM], window[NSAM], signal[NSAM], sinea[NSAM],

sineb[NSAM],fftouta[NSAM],outputa[NSAMS],outputb[NSAMS];

int n, k, len, res_shift;

float scale, scalefft;

scale = 1;

res_shift = 15;

len = NSAM;

scalefft = EFFTALLSCALE

/*** Initiate Twiddle Values ***/

if(InitFft (MAXSIZE) != EDSP_OK)

printf("Problem With Initialisation Of FFT");

/*** Generate Sine Wave With Sudden End***/

if(sine == 0)

{

printf("Generating Sine Wave");

k = len / 64;

for(n = 6; n < 122; n++)

{

sinea1[n] = floor(1000 * sin(TWOPI * k * n / len) + 0.5);

}

k = len / 4;

for(n = 6; n < 122; n++)

{

sineb[n] = floor(1000 * sin(TWOPI * k * n / len) + 0.5);

}

printf("Done");

}

/*** Add Sine waves ***/

for(n = 6; n < 122; n++)

{

sinea[n] = sinea1[n] + sineb[n];

}

for(n = 0; n < 6; n++)

{

sinea[n] = 0;

}

for(n = 122; n < 128; n++)

{

sinea[n] = 0;

}

/*** Generate Windows ***/

if(GenTriangle( window, NSAM) != EDSP_OK)

printf(" Problem With Window Generator ");

/** Multiply Window With Signal **/

if(VectorMult( signal, window, sinea, NSAM, res_shift) != EDSP_OK)

printf("Problem With Vector Mult");

/*** Take FFTs Of Sine Waves ***/

if(FftReal( fftouta, signal, NSAM, scalefft) != EDSP_OK)

printf("Problem With FFT");

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 29

/** Covert FFT result to log magnitude format **/

if(LogMagnitude( outputa, fftouta, NSAMS, scale) != EDSP_OK)

printf("problem With LogMagnitude");

/** Release FFT RAM **/

FreeFft();

}

The above C code listing brings together a number of different DSPLib functions, and

provides a good example of DSP processing on the SH.

The reader should be familiar with the FFT content of the program, having read section

2.0 of this document.

Following the flow of the program, the first task was to generate the signal. Two sine

waves, one of Fs/4 and one of Fs/64 are generated and then added together. The floor

function used in the generation of the sine waves is contained in the math.h library.

This function takes the solution of the equation to its’ right, and rounds it to the nearest

integer that is not greater than it. Hence the function rounds down.

Next, six zeros are deliberately inserted into the start and finish of the data set inorder to

create a sudden end to the signal, and hence generate many harmonics, as shown in

Diagram 3.6. Following that, a window is generated using one of the functions

discussed in this section. The particular window function can simply be changed to test

for best results, as was the case in this example.

A window is applied to the signal on a sample by sample bases. I.e. sample zero of the

signal is multiplied by sample zero of the window, and so on until each element has

been processed. This task is performed by the function VectorMult which is contained

in DSPLib under the miscellaneous section. For more details regarding this function,

the reader should consult section 6.0 of this document, as well as the related Ensigma

documentation.

DSPLib For SH Application Note (19-031/1.0) Window Functions

Hitachi Micro Systems Europe Ltd 30

Once the signal has been windowed, the resultant waveform is applied to the FFT

function. This function is described in detail in section two of this document, and is of

no concern to us here.

The solution from the FFT is converted to log format, and stored in an array for future

use. The FFT chosen was ‘not-in-place’, however an ‘in-place’ implementation would

also be suitable and save memory usage.

The results shown in Diagrams 3.5 through 3.10 were generated by the given C code

running on an SH-1 processor. Clearly for a real application, the signal would not be

generated on chip, but would instead be read in from an external source such as an

ADC. Notes on processing real time signals are included in section 7.0 of this

document.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 31

4.0 Filters

Introduction

This section of DSPLib provides a number of functions which allow the user to perform

digital filtering on discrete time samples. Essentially the library offers three different

filter networks, even though it contains eight filter functions. The three filters are as

follows :-

Finite Impulse Response (FIR)

Infinite Impulse Response (IIR)

Least Mean Squares Adaptive Algorithm (LMS)

DSPLib allows the user to implement the above functions on both blocks of data, or

alternatively on a single sample process which is intended for real time applications.

The accuracy of the LMS algorithm can be further increased by the use of DIIR which is

a double precision IIR implementation allowing the user to specify 32-bit coefficients.

This document is intended to provide the reader with an introduction to the

implementation of digital filters on the SH processor. This chapter details some

background theory on digital filters, but is not intended to form a comprehensive

reference text. A certain level of DSP knowledge is assumed, and if the reader requires

further information, a relevant DSP text should be consulted.

4.1 Filter Background Theory

In order to ensure a full appreciation of the function descriptions later in this section,

some basic filter theory is outlined here. The theory describes the difference between

the filter types, as well as outlining the filter structures used in DSPLib. It is

recommended that the reader follows this text before moving onto the particular

function description of interest.

The digital filter lies at the heart of DSP processing. Digital filters have found

applications in many different areas, and their everyday use is common place. Digital

filters can be found in equipment ranging from compact disc players to mobile

telephones.

Digital filters are implemented with many different structures, and numerous hardware

platforms. Hardware filters are available which allow the coefficients to be loaded in

from ROM, and clearly dedicated DSP cores allow efficient implementation. DSPLib

allows the user to take advantage of all the features you would expect when using a

RISC core, as well as utilising the SH MAC (Multiply and ACcumulate) register, DMA

(Direct Memory Access) controller and on chip ADC capabilities which together make

real time micrcontroller DSP a reality.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 32

4.1.1 Finite Impulse Response Filters (FIR)

DSPLib offers two implementations of FIR filters. The basic operation of these

functions is described later in the routine descriptions.

The FIR filter has a number of distinct advantages, in particular :-

i) FIR filters implemented in direct form (functions of past and present samples), they

are always stable.

ii) The phase response of an FIR filter is exactly linear. A filter that doesn’t have linear

phase will cause a phase distortion on the signal passed through it. Such distortion is

unacceptable in certain applications such as audio, video and various types of data

transmission.

iii) As indicated in the following text, the implementation of an FIR filter is very simple.

The SH series of microcontrollers have an architecture that is particularly suited to the

implementation of such filters. The heart of the FIR filter is the MAC operation, which

the SH series can perform in three CPU clock cycles.

The FIR filter is described in the time domain as follows :-

yn hkxn k

() ()( )

=−

−

∑

Taking the Z transform of the above expression enables us to represent the FIR filter in

the digital domain as follows :-

HZ hkZk

() ()

=−

−

∑

As can be clearly seen, the entire FIR routine consists of multiplying past and present

samples with the filter coefficients h(k), and summing the solutions. Hence the

implementation of the filter is entirely MAC operations and delays.

DSPLib implements the FIR filter by way of a transversal structure. This structure is

shown below in diagram 4.1.

h(0) h(1) h(2) h(3) h(4) h(5) h(6)

Z-1 Z-1 Z-1 Z-1 Z-1

Z-1

x(n)

y(n)

Diagram 4.1 :- FIR Transversal Structure

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 33

Diagram 4.1 shows the FIR filter structure as implemented by DSPLib. x(n) represents

the input samples, and y(n) the resulting filtered output. The h(n) elements represent the

filter coefficients, and it is these that determine the filters frequency response. For the

DSPLib implementation, the coefficients should be 16-bit signed values. It is important

to remember this when designing the filter as the quantisation will have a large

influence on the overall response of the filter.

DSPLib doesn’t support any filter design methods. It is required that the user determine

their own coefficients either by traditional hand calculation methods, or alternatively the

use of one of many commercially available software design packages. The later is

recommended since different design methods can be tried, and also the analysis of finite

word length effects is often possible by the use of simulation before the actual filter is

implemented on the SH-1.

The implementation of an FIR filter, or indeed any type of digital filter depends upon

the intended use. DSPLib provides two FIR routines, both implemented using a

transversal structure. The difference in the routines is the execution. One routine takes

a finite block of data and performs the filter operation upon it, whilst the second routine

filters the data on a sample by sample process. Clearly the latter is more suited to real

time applications, in particular situations where the sampling rate introduces harsh time

constraints. More detail regarding the two functions is included in the routine

description section of this chapter.

Because the FIR filter has no feedback, the filter is inherently stable. However, a

drawback to this advantage is that to gain a particular performance criteria, an FIR will

be of a higher order to that of the IIR. Therefore, in general unless any of the properties

described at the start of this section are to be exploited, the IIR filter is used. The IIR

filter is described below in 4.1.2.

4.1.2 Infinite Impulse Response Filters (IIR)

DSPLib offers four implementations of the IIR, all of which are described later in the

routine description section of this chapter.

The IIR filter has feedback, and it is this feedback which leads to the filters strengths

and weaknesses. The IIR requires fewer coefficients than FIR for the same set of

specifications. IIR applications are normally in areas where sharp cut-off is required,

along with high speed and in turn a high throughput of data. Of course this type of

performance doesn’t come free, and the price we pay for the IIR performance is

decreased stability. Care must be exercised when designing the filter to ensure that the

filter doesn’t become unstable and result in an oscillating output.

The Z plane transfer function for the IIR is as follows :-

Hz YZ

XZ bbZ bZ

aZ aZ

() ()

()

−+

−−

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 34

If we take the inverse Z transform of the above function and then separate out into

difference equations, we arrive at the following :-

wn awn awn xn

yn bwn bwn bwn

() ( ( ) ( ) ())

() ( () ( ) ( ))

=−+−+

=+−+−

01 2

Where w(n) represents a centre tap in the filter, x(n) is the input data and y(n) is the

resulting filtered data. Coefficients for the filter are represented as a and b and are

ordered accordingly when being presented to the filter. The coefficient ordering and

filter scaling is discussed in the routine description section.

All IIR filters in DSPLib are implemented using Direct Form II structure. This structure

corresponds directly to the difference equations given above. The structure of this type

of IIR filter is given below in Diagram 4.2

Z-1

-1

X(Z) Y(Z)W(Z) b0

Diagram 4.2 :- Direct Form II Second Order Biquad

The above diagram shows a Direct Form II second order IIR biquad, which is the

implementation used in DSPLib. If, as in most cases, a higher order filter is required,

the biquads are cascaded. Hence when specifying your IIR filter order, it should be a

factor of two.

The IIR routine descriptions detail the implementation of the filters. Care must be

exercised when specifying shifts applied to the data, but again the routine description

section discusses this issue in detail.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 35

4.1.3 Filter Specification

Although actual filter design methods are not discussed in this document, it is however

important that the reader has an understanding of methods for specifying filters. There

are a number of excellent software filter design packages available, all of which require

the user to specify the required response.

Taking a low pass filter as an example, a filter is typically specified by the following

parameters :-

Pass Band Ripple

Stop Band Attenuation

Transition Width

Filter Order

The frequency response of an ideal low pass filter is shown below in Diagram 4.3. The

frequency response cuts of immediately when we reach the cut off frequency Fc. Of

course in reality, this kind of filter is not realisable, and so certain compromises must be

reached wen arriving at an acceptable specification.

0dB

Gain(dB)

Frequency(Hz)Fc

Desired Filter Response

Diagram 4.3 :- Ideal Low Pass Filter

Because a finite number of filter coefficients are used in practice to specify a filter, and

also the degree of quantisation introduced to both the coefficients and the data, a

number of errors are introduced. Diagram 4.4 shows a practical filter response, and

enables the user to gain an understanding of which parameters need to be specified

when designing a filter.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 36

The order of the filter has a great bearing on the response of the filter, with small order

filters having a poor response.

If a sharp cut off is required, i.e. a small transition width, then the filter order must be

very high. Clearly this introduces a number of factors which are limiting.

Firstly, a high order filter requires a large amount of computation due to its long length,

this in turn introduces two problems. The length of the filter means that a long delay is

introduced as the data ripples through the filter. Secondly, for each sample, a very large

number of computations are required, and so the processing time is increased, hence

reducing the real time capability of the system.

As well as computational problems, large filters introduce another problem. If a large

number of coefficients are used, a large amount of memory is required to store both the

coefficients and previous samples. Memory is expensive, and also introduces a

limitation in speed due to the large number of memory accesses required.

When designing a filter it is necessary for the engineer to reach a number of

compromises. The ideal response is played off against speed and cost, with a suitable

system eventually being derived.

As well as the transition width expanding with a decrease in filter order, several other

effects take place. The pass band region develops ripple, and the stop band attenuation

becomes less. Clearly the designer should have a feel for the acceptable levels of ripple

and attenuation such that an optimised filter may be designed. A practical filter

response showing these effects is shown below in Diagram 4.4.

0dB

Gain(dB)

Frequency(Hz)

Passband Transition

Band Stopband

Pas sband R ipple

Stopband

Attenuation

Diagram 4.4 :- Actual Low Pass Filter Response

As can be seen above, the actual response is very different to that desired. Typically

when specifying a filter, the user indicates the acceptable level of ripple in the passband,

the transition width and the required stopband attenuation.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 37

The software design package will then normally suggest a filter order which would

enable the stated criteria to be satisfied. Clearly if the user requires a filter with low

pass band ripple, a small transition width and high stop band attenuation, a high order

filter is required.

4.2 Routine Descriptions

4.2.1 Fir

The function FIR implements a Finite Impulse Response filter. The FIR filter has a

transversal structure which is described in section 4.1.1 of this document. The related

Ensigma documentation takes the reader through the definition of the function, but for

the readers reference, the function is defined as follows :-

int Fir(output,input,no_samples,coeff,no_coeffs,res_shift,workspace)

Where,

short output[] output samples y

short input[] input samples x

long no_samples number of samples N to be filtered

short coeff[] array containing filter coefficients h

long no_coeffs length of filter, thus number of coefficients K

int res_shift right shift applied to each output

short *workspace pointer to filter memory

Using Fir

the use of the filter routines is relatively straight forward. It is important to initialise the

filter memory (on chip), before calling the routine. The memory is initialised by calling

the routine InitFir. This routine sets aside a section of on chip RAM for the storage of

coefficients and also past samples which are required for FIR computation.

Coefficient Storage

Coefficients should be calculated as 16 bit fixed point. The calculated coefficients

should then be stored in an array which can be seen by the compiler. Assuming that the

array is called coeff, a typical file would follow the following format :-

short coeff[10] = {0,0,12,15,16,16,15,12,0,0};

Clearly when a large number of coefficients are used, it is good practise to store the

coefficients in their own file, and include the array file name in the top of your main

program.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 38

Example

The following example outlines the use of the Fir routine. The example is written in C

and is intended to provide the reader with a typical code layout.

#include <stdio.h>

#include <math.h>

#include <ensigdsp.h>

#include "testhead.h"

/*** Include File Containing data to be filtered ***/

#define NSAM 128

/* number of samples */

#define no_coeff 32

#define MAXSUM 67108864

void main(void)

{

/*** Declare Local Variables ***/

short coeff[no_coeff], filout[NSAM];

short *work;

int n, res_shift;

long sum;

/*** Array Containing Coefficients ***/

coeff[32]={0, 0, 9, -1, 62, -253, 99, -168, 1283, -1017, 75, -3560,

5196, -237, 7506,-26122, 17126, 17126, -26122, 7506, -237, 5196,-3560,

75, -1017, 1283, -168, 99,-253, 62, -1, 9};

res_shift = 16; /*** Shift Applied To Output Data ***/

/*** Check Scaling Of Coefficients ***/

for(n = 0; n < no_coeff; n++)

{

sum += fabs(coeff[n]);

}

if(sum >= MAXSUM)

printf("Bad Coefficients");

/*** Initiate Filter Workspace ***/

if(InitFir(&work, no_coeff) != EDSP_OK)

printf("Problem With InitFir");

/*** Fir Filter The Signal ***/

if(Fir( filout, signal, NSAM, coeff, no_coeff, res_shift, work) ==

EDSP_BAD_ARG)

printf("Filter Problem");

}

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 39

The above C code example shows a typical filter routine for processing blocks of

samples.

In this case, the routine takes a block of 128 samples, and performs a filter operation

on them accordingly. The filter is of length 32, and the pre-calculated samples are

stored in the array coeff[] at the start of the program.

Before the filter is initialised, the coefficients are checked for scaling. It is suggested as

a general rule of thumb that the absolute sum of the filter coefficients is less than 226.

Checking this should help to avoid saturation, as long as the result shift is set

accordingly. Obviously, in an environment where time is an issue, it is desirable to

perform such checks during development, and remove the code for the final version

when the routine is known to be working.

Next, the filter workspace has to be initialised. The routine InitFir() sets aside an area of

on chip RAM for the filter routine to use as a scratch pad. The actual area is undefined,

and will vary on each execution. It is therefore recommended that the user does not

attempt to access this area at any time. When calling this routine, a pointer must be

specified to the area, in our case the pointer ‘work’ is used. This pointer must be

specified when calling the actual filter routine.

Finally in our C code example, the FIR filter routine is executed. The function Fir is

called, with the following arguments:-

filout() = Samples from filter

signal() = Input samples for filter

NSAM = Number of input/output samples

coeff() = Filter coefficients

no_coeff = Number of filter coefficients (length of filter)

res_shift = Shift applied to output

work = Pointer to filter workspace in RAM

These arguments tie in with the Fir definition at the start of this section. If the

arguments are outside of a particular range, the filter will not operate, and will instead

return a bad argument flag. This flag, called EDSP_BAD_ARG is tested in the above

example, and its’ generation in turn is used to generate an error message to the user.

The flag EDSP_BAD_ARG is generated if the following occurs :-

no_samples < 1

no_coeffs <= 2

res_shift < 0

res_shift > 27

4.2.2 Fir1

This routine follows on directly from the above description of Fir, and it is therefore

recommended that section 4.2.1 is read before moving on to this section.

Fir1 is identical to Fir except for the fact that it processes one sample at a time.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 40

This routine is optimised for signal sample computation, and should be used instead of

the above routine when ‘no_samples’ is set to one. The routines definition is outlined

below :-

Fir1(output,input,coeff,no_coeffs,res_shift,workspace)

Where,short *output Pointer to single output sample y(n)

short input single input sample x(n)

short coeff[] array containing filter coefficients h

long no_coeffs number of coefficients K (length of filter)

int res_shift right shift applied to output

short *workspace pointer to filter memory

Fir1 will give bit identical results to Fir, and uses the filter workspace to hold previous

samples that are necessary for computation. The routine finds applications in areas

where real time processing is required, and as a result the filter operation is applied on a

sample by sample bases.

Since thi s function is essent ially identi cal to the Fir routine, coefficients are stated in an

identical manner to that shown in 4.2.1. Also, for the purpose of a software example,

the code shown in 4.2.1 is relevant. For an identical filter as that shown, the above code

may be re-written as follows :-

/*** Fir Filter The Signal ***/

if(InitFir(&work, no_coeff) != EDSP_OK)

printf(“Problem With InitFir”);

for(n=0; n < NSAM; n++)

{if(Fir1( filout, signal[n], coeff, no_coeff, res_shift,

work)==EDSP_BAD_ARG)

printf(“Problem With Filter”)

}

The above example is a ‘snippet’ of code which could replace the filter section of the

example code shown in 4.2.1. The code uses a for loop to filter a number of samples

one at a time. Clearly, in a real time environment, the loop would be omitted, and the

filter operation would be initiated by the end of the ADC conversion, as an example.

It should be noted that before using Fir1, the filter workspace should be initialised by

calling InitFir. This is the same initialisation routine as that used for Fir.

In order to minimise quantisation noise and saturation, the same scaling rules apply as

that discussed earlier, being to keep the absolute sum of the coefficients to a level lower

than 226. The resultant shift should also be set to 26 if overflow is considered to be a

problem, however in most cases this simply isn’t an issue.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 41

4.2.3 Iir

Now we move onto IIR filters. The reader should be familiar with the basic concepts of

IIR filters before reading the following routine descriptions. Section 4.1.2 of this

document takes the reader through the basic ideas behind IIR filters, as well as

explaining the filter structure for implementing IIR, and in particular the Direct Form II,

as used in the DSPLib routines.

As described above, Iir is implemented using Direct Form II biquads. A biquad is a

single second order filter stage which may be cascaded in order to form higher order

filters. Now because second order filters are cascaded to implement the higher order

designs, it is important for the reader to remember that the stated filter order must be a

factor of two.

The Iir function is defined as follows :-

int Iir(output,input,no_samples,coeff,no_sections,

workspace )

where,

short output[] output samples

short input[] input samples

long no_samples number of samples to be filtered

short coeff[] filter coefficients

long no_sections number of second order sections

short *workspace pointer to workspace

Coefficient Storage

The definition is similar to that shown in the Fir descriptions earlier. However, due to

the differences in structure, the Iir coefficients are specified in a very different manner.

The filter coefficients are stored in a single array. By looking at the transfer function of

a single second order biquad, we can see that there are two groups of coefficients for

each stage, being an and bn.

Hz YZ

XZ bbZ bZ

aZ aZ

() ()

()

−+

−−

Hence, as can be seen from the transfer function, for each second order biquad, the user

must specify five coefficients. However, another factor is thrown into the equation

which further complicates things. The Iir functions in DSPLib can have a shift applied

to them at each stage, and the shift applied is stated in the same file as the coefficients.

The Iir difference equations for Iir should make things clearer.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 42

The equations are as follows :-

wn awn awn xn

yn bwn bwn bwn ak

() ( ( ) ( ) ()).

() ( () ( ) ( )).

=−+−+

=+−+−

−

12 15

01 2

122

If you are not familiar with these equations for the second order biquads, you should

read section 4.1.2 before preceding here.

As can be seen above, each of the two ‘a’ coefficients have a 15 bit right shift applied,

which means that the ‘a’ coefficients MUST be in Q15 format in order to ensure

accurate results. The ‘b’ coefficients are subject to a user specified shift, which as

described is contained in the same array as the coefficients. I.e. in the above equation if

the’b’ coefficients are in Q15 format, a0k should be set to 15. Hopefully, it therefore

becomes clear that the a0k term describes a shift rather than a coefficient. The k part of

the term describes which cascaded biquad the term belongs to.

Hence, if two biquads were used, the coefficients would be specified as follows :-

{a00,a10,a20,b00,b10,b20,a01,a11,a21,b01,b11,b21}

It should now be clear that in the above array, the terms a00 and a01 are not coeffi cients,

they represent a right shift which is applied to the ‘b’ coefficients.

Example

There now follows a basic example of how the Iir function is typically used. The

coefficients have been designed on a separate package and have been quantised to 16 bit

resolution. The ‘a’ coefficients are in Q15 format, and the ‘b’ coefficients are in Q15

format, with the shift set to 15 as well, thus returning the solution in the correct form.

The C code example is as follows :-

#include <stdio.h>

#include <math.h>

#include <ensigdsp.h>

#include "testhead.h"

/*** Include File Containing data to be filtered ***

#define NSAM 128 /* number of samples */

#define no_sections 2

#define no_coeff 12

void main(void)

{

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 43

/*** Declare Local Variables ***/

short coeff[no_coeff], output[NSAM];

short *work;

/*** Array Containing Coefficients ***/

coeff[no_coeff]={15,-29186,-11663,9008,18009,9008,15,-27423,

28914,14004,27934,14004};

/*** Initiate Filter Workspace ***/

if(InitIir(&work, no_sections) != EDSP_OK)

printf("Problem With InitFir");

/*** IIR Filter The Signal ***/

if(Iir( output, signal, NSAM, coeff, no_sections, work) ==

EDSP_BAD_ARG)

printf("Filter Problem");

}

The above C code takes the reader through a very simple example of how to integrate

the Iir function within a program. The coefficients are included in the program for

clarity, but could just as easy be in a different file that is pointed to by an include

statement. The two a0 values of 15 can be easily seen in amongst the coefficients, which

constitute a shift right.

The filter workspace is initiated by calling InitIir. In a time critical system, this

operation should be performed during the initial set-up, and not included in the main

loop. The filter only has to be initialised once for a particular filter size, and is

completely independent of the number of function calls made to Iir.

The final part of the program details the actual calling of the function. The data is taken

from the array ‘signal’, which is omitted from the above code. In reality, ‘signal’ would

be a series of time domain samples stored in its’ own file pointed to by an include

statement. The function is tested for bad arguments, and returns an error message if

there is a problem.

4.2.4 Iir1

This function follows directly on from the above description of Iir. Iir1 offers an IIR

filter which performs a filter operation on a single sample, thus making it particularly

suitable for real time operations. The function is defined as follows :-

int Iir1( output, input, coeff, no_sections, workspace)

Where,short *output pointer to filtered output sample

short input input sample

short coeff[] filter coefficients

long no_sections number of second order filter sections

short *workspace pointer to start of filter workspace

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 44

The function is used in exactly the same manner as Iir. Filter coefficients are specified

in the same format as that described in 4.2.3, and the method of applying a shift to the

output is also identical. When the user wishes to use Iir with no_samples set to one, this

function should be used instead. Iir1 is optimised for single sample operation, and will

offer some computational savings over Iir in this particular situation.

The software example given in the previous example is directly relevant here. For the

same routine to be implemented using Iir1, the following change would be made.

/*** IIR Filter The Signal ***/

for( n=0; n <= NSAM; n++)

{

if(Iir1(output,signal, coeff, no_sections, work) == EDSP_BAD_ARG)

printf("Filter Problem");

}

I.e the filter section of the routine has been replaced by a for loop which repeatedly runs

the Iir1 routine. Clearly in a real application it is unlikely that the function would be

used in this manner, but it serves as an example of the function operation. This routine

will return bit identical solutions to the example using Iir.

Before calling Iir1, it is important to initialise the filter workspace. Although the filter

performs computation on one sample at a time, the filter must store previous input and

output samples in order to perform the filter operation. Therefore, as with all the filter

routines in DSPLib, the user must initialise the filter workspace. When using Iir or Iir1

the function InitIir must first be called. When calling this routine, the user must provide

a pointer to the start of the workspace, and also indicate the size of the filter. The

example in section 4.2.3 shows the use of InitIir, and is directly relevant to the use when

calling Iir1.

4.2.5 DIir

This function implements an IIR filter with a direct form II structure. The filter is made

up of a series of cascaded second order biquads. If the reader is not familiar with the

structure of an IIR filter, they must consult section 4.1.2 before continuing here.

This routine offers a filter which is essentially the same as that offered by Iir, except that

it allows double precision computation. This added precision is achieved by the

inclusion of 32 bit coefficients. Input and output data is still 16 bit, but the added

precision of the coefficients will increase the accuracy of the filter in situations where

accuracy is an issue and could possibly affect filter stability.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 45

The related Ensigma documentation explains how the double precision is realised, and it

is the purpose of this document to demonstrate how to use the routine. DIir is defined as

follows :-

int Diir( output, input, no_samples, coeff, no_sections,

workspace)

Where,

short output[] filter output samples

short input[] filter input samples

long no_samples number of samples to be filtered

long coeff[] 32 bit filter coefficients

long no_sections number of second order sections (biquads)

long *workspace pointer to filter workspace

Using DIir

The use of DIir is essentially the same as using Iir, only the coefficients are 32 bit.

Before calling DIir it is important to initialise the filter workspace. This is achieved by

calling the function InitDIir. When calling this function, the user must specify the

number of second order sections in the filter, and also provide a pointer to the area in

RAM which this function sets aside for the filters use. The reader can see this function

in the C code example below. It is important that the user doesn’t try to access the filter

workspace at any point for two reasons. First of all, the area is undefined, and so the

user has no idea of the location or size of the area before or during execution.

Secondly, the filter will access the RAM to store past samples. The user does not have

access to the method of storage, and so intervention could result in the failure of the

filter.

Coefficient Storage

Since this function implements an IIR filter, the method of storing coefficients is similar

to that shown in section 4.2.3. However, due to the double precision of the coefficients,

the method of applying shifts to the results is different. In order to understand the filters

operation, we must first look at the double precision filter difference equations.

wn a d n a d n xn

kk kk

( ) [ ( ) ( ) ( )].

=−+−+ −

12 31 31

1222

yn bdn bdn bdn

kk kk kk a

( ) [ ( ) ( ) ( )].

=+−+−

−

01 2

122

The above equations are essentially the same as those given for the Iir function. The

coefficients are stored in a single array in the same format, which is outlined below.

{a00,a10,a20,b00,b10,b20,a01,a11,a21,b01,b11,b21............}

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 46

The area where this function differs is the specification of shifts applied to outputs of

biquads. As the difference equations show, the ‘a’ coefficients must be specified in Q31

format. The output of each biquad can be shifted by the use of a0k and it is important

that the reader fully understands how the shifts work.

When the 16 bit input x(n) is applied to the filter, it is firstly multiplied by 216 to put it

into Q31 format which is suitable for the filter arithmetic. This shift must be

compensated by the user at the output in order to gain accurate results. The following

two block diagrams should help to clarify matters.

216 22

-31 -31

31 15

a1,a2

Coeffs b0,b1,b2

Coeffs

D Iir Filt er

x(n) y(n)

Diagram 4.5 :- Shifts Applied In A Single Second Order Biquad

Referring to the above diagram, the method of applying shifts to a single second order

biquad can now be explained.

The input samples, x(n) are immediately multiplied by 216 by the function, the user has

no control over this. Next, all the ‘a’ coefficients must be specified in Q31 format, as

they are multiplied by 2-31 by the filter function, and again the user has no control over

this. Finally we introduce the ‘b’ coefficients, which are multiplied by a user specified

shift. This shift, a0k must not only compensate for the format of the ‘b’ coefficient s, but

also compensate for the original shift applied to the input data.

In the diagram above (4.5), the coefficients are stored in Q15 format. The final result

shift applied is therefore 2-31. The reason for this is as follows : The coefficients are in

Q15 format, which means that they have been multiplied by 215 to put them in correct

form. Also the shift has to compensate for the original 216 shift applied at the start of

the process. Therefore we arrive at the required shift of 2-31 which will compensate for

both shifts and return the solution in the correct format, whilst allowing the processor to

perform double precision computation.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 47

If the filter order is higher than two, i.e. more than one biquad is used, the compensation

doesn’t have to be applied until the last stage, thus allowing the intermediate stages to

be expressed in Q31 format.

216 22

-31 -31

31 31

a1,a2

Coeffs b0,b1,b2

Coeffs

DIir Filter

x(n)

DIir Filter

2-31

231

a1,a2

Coeffs

b0,b1,b2

Coeffs

-47

Diagram 4.6 :- Shifts Applied In A Fourth Order Filter

The function allows the final output stage shift to be expressed right up to and including

48. This means that on a fourth order system and higher, the filter can be a true dual

precision. In the example above, it is clear to see that the final shift applied is 47, which

compensates for the Q31 format of the ‘b’ coefficients, and also compensates for the

original 216 shift applied at the start. It is very important to remember that such a large

shift is only possible on fourth order filters and higher, and as a result, it is not possible

with this function to implement a true second order double precision filter.

Example

The best way to reiterate the above theory is by way of an example. This example is

written in C as are all the examples in this text. The example shows the array containing

the filter coefficients, as well as the function calls to implement the filter. The filter

coefficients were calculated on a design package, and implement a fourth order low pass

filter. The values obtained from the package were as follows :-

Design Package Solution Q31 Format

a10 = 0.6149292 1320550402

a20 = 0.4001465 859308065

b00 = 0.1794281 385318910

b10 = -0.3097076 -665092006

b20 = 0.1794281 385318910

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 48

a11 = -0.2411804 -517930965

a21 = 0.8682861 1864630202

b01 = 0.7138977 1533083637

b11 = -0.7098694 -1524432929

b21 = 0.7138977 1533083637

Hence the result of the filter design operation is shown on the left, and the Q31 format

value is shown on the right. These values are then used in the double precision filter

example outlined below. The reader should take particular notice of the coefficient

array, and especially the a0 values which describe the shift applied during the filter

operation.

#include <stdio.h>

#include <math.h>

#include <ensigdsp.h>

#include "testhead.h"

/*** Include File Containing data to be filtered ***/

#define NSAM 128 /* number of samples */

#define no_sections 2

#define no_coeff 12

void main(void)

{

/*** Declare Local Variables ***/

short coeff[no_coeff], output[NSAM];

short *work;

/*** Array Containing Coefficients ***/

coeff[no_coeff]={31, 1320550402, 859308065, 385318910, -665092006,

385318910, 47, -517930965, 1864630202, 1533083637, -1524432929,

1533083637};

/*** Initiate Filter Workspace ***/

if(InitDIir(&work, no_sections) != EDSP_OK)

printf("Problem With InitDIir");

/*** DIIR Filter The Signal ***/

if(DIir( output, signal, NSAM, coeff, no_sections, work) ==

EDSP_BAD_ARG)

printf("Filter Problem");

}

The above example filters a set of data samples 128 long using the double precision IIR

filter. Before calling the filter routine, the filter workspace is initialised by calling

InitDIir. This function requires two arguments, one being a pointer to the workspace,

and the second being the number of second order sections the filter contains, thus

indicating to the function the size of RAM required.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 49

The filter coefficients are stored in an array in the main program, and this should clearly

demonstrate the shifts which are required. All the coefficients are in Q31 format (the ‘a’

coefficients must be in Q31 format), and the two shifts applied are 31, and 47. The shift

of 47 compensates for the 16 place shift applied at the input as described earlier.

The actual filter routine is called at the end of the program, and will filter the block of

128 samples, and then finish. The actual calling of the routine and arguments used is

identical to Iir described in 4.2.3, except that the solution although still 16 bit is of a

higher precision.

4.2.6 DIir1

This function offers exactly the same filter operation as that described in 4.2.5. The

function implements a double precision IIR filter, but only processes a single sample at

a time. This makes DIir1 particularly suitable for real time operation where the user

wishes to perform computation on a sample by sample bases. The function is defined as

follows :-

int DIir1( output, input, coeff, no_sections, workspace)

Where

short *output pointer to output sample

short input input sample

long coeff[] array containing 32 bit coefficients

long no_sections number of second order sections

long *workspace pointer to start of filter workspace

The filter is used in exactly the same manner as DIir, and the reader should consult

section 4.2.5 for full details of the operation.

The method for specifying the filter coefficients is also exactly the same, i.e. they are all

32 bit and require the same methods of applying shifts as that described earlier. If the

user wishes to use DIir with ‘no_samples’ set to one, this function should be used

instead. DIir1 has been optimised for single sample computation and will provide an

increase in speed compared to DIir.

To modify the C code example given in the previous section such that DIir1 can be

used, the following change would be made.

/*** DIIR Filter The Signal ***/

for( n=0; n <= NSAM; n++)

{

if(DIir1(output,signal, coeff, no_sections, work) == EDSP_BAD_ARG)

printf("Filter Problem");

}

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 50

As should be clear from the example code above DIir1 is the same as DIir only the user

does not have to specify the number of samples to be processed since it only processes

one at a time.

Before calling DIir1, it is essential that the filter workspace is initialised.

This is achieved in exactly the same manner as that for DIir, that is by calling InitDIir

and specifying the number of second order sections the filter contains, and also

providing a pointer to the assigned memory location. The use of this initialisation

function can be seen in the software example in section 4.2.5.

The code above is intended only to indicate the operation of the function. Imbedding

the function in a For loop will actually increase the run time compared to the use of DIir,

and so is not recommended. However, for evaluation purposes, the user will find that

both DIir and DIir1 will return bit identical results.

Even though the filter only processes one sample at a time, it is necessary to store past

samples to feedback into the filter. These samples are stored in the filter workspace,

and the very oldest samples are discarded as the number of single sample computations

increase.

As a final note in this section it should be pointed out to the user that using the double

precision routines will most probably not be necessary, and will impose a time

constraint. In most applications, the accuracy of the IIR functions will be suitable, and

due to the simpler nature of the implementation, the IIR functions run an order of

magnitude faster on the SH1. The use of the double precision filters should be

contained to areas where the user feels that single precision will adversely affect the

filter operation and lead to problems such as instability. As an example of the speed

difference, on a 10MIPS SH-1, Iir1 takes 18.1uS/sample where as DIir1 takes

48.0uS/sample.

4.2.7 Lms

This is the final filter structure offered by DSPLib, and comes in two forms LMS and

LMS1. This section describes the background behind the LMS filter, and also includes

an example of the routine performing a filter operation.

LMS stands for Least Mean Squares, which in turn describes the algorithm which is

used in this adaptive filter. The LMS adaptive filter is based around an FIR filter, and it

is assumed that the reader is familiar with its’ basic operation. If this is not the case, the

reader should consult section 4.1.1 of this document which explains the basic concepts

behind the FIR, as well as showing the transversal structure which is incorporated into

this adaptive algorithm.

The function Lms is defined as follows :-

int Lms( output, input, ref_output, no_samples, coeff,

no_coeffs, res_shift, conv_fact, workspace)

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 51

Where,

short output output samples y(n)

short input input samples x(n)

short ref_output desired output values d(n)

short no_samples number of samples to be filtered

short coeff[] filter coefficients h(n)

long no_coeffs number of coefficients

int res_shift right shift applied to output

short conv_fact convergence factor 2µ

short workspace pointer to filter workspace

The LMS routine provides a filter that will adapt to a reference signal given to it. This

means that no filter design is necessary, the filter will adjust its own coefficients using

the LMS algorithm such that the reference signal provided to it is included from the

main filter signal.

As an example, if a filter is supplied with a reference sine wave of frequency Fs/4, and

an input signal containing a number of sine waves such as Fs/2,Fs/4,Fs/8 and Fs/16, the

output of the filter would be Fs/4. Hence the filter has adapted its coefficients such that

the Fs/4 component of the signal is included.

The LMS filter is mathematically described as follows :-

FIR Filter :-

yn h kxn k

() ()( )

=−











−

∑

LMS Adaptive Algorithm :-

hkhk enxnk

=+ −

12() () ()( )

Where h(k) are the filter coefficients which are adapted. µ determines the convergence

rate, i.e. the rate at which the coefficients converge and provide the required output.

Clearly the faster the convergence rate, the less accurate the output. It is up to the user

to arbitrarily test different values until a suitable response is achieved.

The same scaling conventions apply to this filter as with the Fir routine. However, it is

not guaranteed that the adaptive algorithm will keep to the convention, and saturation

may occur. If this is the case, it is most likely that the filter is having trouble adapting,

and the filter order should be increased to rectify this.

Diagram 4.7 shows a block diagram showing the implementation of the filter.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 52

Z-1 Z-1 Z-1 Z-1 Z-1

Z-1

x(n)

y(n)

N+1 Weights

Desired Output

A d ap tive Algorith m

d(n)

e(n)=d(n)-y(n)

(Output)

(Input)

(Reference)

F I R F i lte r

Diagram 4.7 :- LMS Filter Block Diagram

With reference to the above diagram, the area within the grey block is the standard FIR

filter implemented in a transversal structure. The output of the filter is subtracted from

the desired signal output (reference) which in turn generates an error signal e(n). This

error signal is used by the adaptive algorithm to update the filter weights (coefficients)

in order to reduce the error signal, and thus eventually arrive at the desired filter

response. Initially, when the filter hasn’t started to adapt, the filter coefficients are set to

an arbitrary value. The value chosen should not make a difference to the final value

reached, however, it may have some affect on the rate at which it meets the ideal

weights.

Wopt

Optim al Weights

Desi red Conv e r gence

Actual Convergence

Diagram 4.8 :- Convergence Of Filter Weights

The above diagram illustrates the variations in the filter weights as the feedback adjusts

them accordingly. Although the final weights are very close to the desired values, they

never actually settle there, but will instead fluctuate around the optimum value.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 53

Example

Now that the basic theory has been covered, an example should help to reiterate any

problem areas. The example takes an input signal which consists of five sine waves

added together, and adaptivly filters all but one of the components. The reference input

is the frequency component that is left at the end. The C code is given first, followed by

an in depth explanation of the process, including the analysis of various filter lengths.

#include <stdio.h>

#include <math.h>

#include <ensigdsp.h>

#define NSAM 128 /* number of samples */

#define TWOPI 6.283185307

#define no_stage 1

#define MAXSIZE NSAM

#define NSAMS (NSAM/2)

#define TWOMU 32767

#define ncoeff 8

/***************** Exercise To Implement An LMS Adaptive Filter

*******************/

void main(void)

{

short coeff[ncoeff], ref[NSAM], noise[NSAM], noisea[NSAM],

fftout1[NSAM], sine1[NSAM], sine2[NSAM], sine3[NSAM], sine4[NSAM],

sine5[NSAM], signal[NSAM], output1[NSAMS], filout[NSAM];

int n, k, loop, res_shift;

long sum, len;

short *work;

len = NSAM;

res_shift = 16;

loop = 100;

/** Coefficients For Adaption **/

coeff[ncoeff]={1,1,1,1,1,1,1,1};

/*** Generate Sine Waves ***/

printf("Generating Sine Wave");

k = len / 8;

for(n = 0; n < len; n++)

{

sine1[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5);

}

k = len / 16;

for(n = 0; n < len; n++)

{

sine2[n] = floor(6553 * sin(TWOPI * k * n / len) + 0.5);

}

k = len / 4;

for(n = 0; n < len; n++)

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 54

{

sine3[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5);

}

k = len / 32;

for(n = 0; n < len; n++)

{

sine4[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5);

}

k = len / 64;

for(n = 0; n < len; n++)

{

sine5[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5);

}

printf("Done");

/*** Create Signal To Be Filtered By Adding Sine waves ***/

for(n = 0; n < len; n++)

{

signal[n] = (sine1[n] + sine2[n] + sine3[n] + sine4[n] +

sine5[n]);

ref[n] = (sine3[n] );

}

if(InitLms(&work, ncoeff) != EDSP_OK)

printf("Problem With InitFir");

/*** Perform Adaptive Filter Operation ***/

/*** Adapt coeffs 100 times ***/

for(n=0; n < loop; n++)

{

if(Lms(filout, signal, sine3, NSAM, coeff1, ncoeff, res_shift,

TWOMU, work) == EDSP_BAD_ARG)

printf("Problem With LMS Operation");

}

free(work);

}

The above code performs two tasks. First of all the code generates the signals which are

to be filtered. Clearly this is for test purposes only, and in a real system, the data would

come from a device such as an ADC. The signal to be filtered consists of five sine

waves all added together. The corresponding frequencies of these signals are Fs/4, Fs/8,

Fs/16, Fs/32 & Fs/64. The sum of these waves forms the data set to be filtered,

consisting of 128 samples. Next, the reference signal is generated, which is chosen to

be Fs/4. This means that the coefficients should adapt and remove all bar the Fs/4

component from the signal.

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 55

Next step in the code is to initialise the filter workspace by calling the function InitLms.

This function is used in exactly the same manner as the previously described

initialisation routines. So finally in the code, we arrive at the actual filter operation.

The function is called 100 times in order to demonstrate how well the coefficients adapt.

The calling of the function is fairly self explanatory, and should not cause any problems.

The function Lms modifies the filter coefficients stored in coeff[], and as a result they

should be stored in on chip RAM, thus allowing fast access. If the coefficients are

stored off chip, it is a good idea to copy them into on chip RAM. The spectrums of the

signals involved in the filter operation are displayed below.

Diagram 4.9 shows the five sine waves added together which forms the input to the

adaptive filter. This signal was then adaptivly filtered as described above with eight

coefficients. The value of 2µ was chosen to be 32767, and wasn’t changed during the

whole of the testing period. The user may wish to start off with a similar value in order

to establish operation, then change accordingly in order to achieve faster convergence.

The result of filtering with eight coefficients or weights is displayed in figure 4.10. As

can be seen, the unwanted elements of the signal have been attenuated, but not removed.

The reason for this is that the small filter order physically cannot apply the attenuation

required to remove the unwanted elements.

FFT Of Input Signal

-10

1 4 7 10131619222528313437404346495255586164

Diagram 4.9 :- Spectrum Of Input Signal To Adaptive Filter

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 56

FFT Of Signal Filt ered By 8 Coefficient s

-10

1 4 7 10131619222528313437404346495255586164

Diagram 4.10 :- Spectrum Of Filtered Signal Using 8 Coefficients

As can be seen in Diagram 4.10, the filter has adapted to the reference signal, but has

failed to filter the unwanted components, even after one hundred operations. Therefore,

for the purposes of demonstration, the number of coefficients, i.e. the FIR filter length is

increased to thirty two. This means that in the software example above, ncoeff = 32,

and the array coeff consists of thirty two elements each containing one.

With the increased coefficient s, the program was ran, and the results shown in Diagram

4.11 below. It is clear to see that the increase in coefficients increased the selectivity of

the filter, and was more capable to reject the unwanted components from the signal.

In a real application, it is good practice to try different filter lengths on a trial and error

basis, thus enabling the user to arrive at the most efficient implementation. Clearly

large filters are bad news since they require more memory, and as a result run slower.

FFT Of Result Of Filtering Wit h R ef = Fs/4

-10

1 4 7 10131619222528313437404346495255586164

Diagram 4.11 :- Result Of Adaptive Filtering With 32 Tap Filter

DSPLib For SH Application Note (19-031/1.0) Digital Filters

Hitachi Micro Systems Europe Ltd 57

As can be clearly seen above, with a filter length of 32, the unwanted signal components

have been completely removed. It is therefore unnecessary to use a filter of a higher

order for this particular application.

4.2.8 Lms1

This function is the final one to be discussed from the filter section of DSPLib. The

function is identical in every way to the function Lms, and the in depth discussion in the

previous section is directly relevant here. The reader should consult both section 4.1.1

on FIR theory, and also section 4.2.7 detailing adaptive filters.

Lms1 provides an algorithm which implements the Least Mean Squares adaptive

algorithm on a single sample. Hence the function finds particular application in real

time processing.

The function Lms1 is defined as follows :-

int Lms1( output, input, ref_output, coeff, no_coeffs,

res_shift, conv_fact, workspace)

Where,

short *output output sample y(n)

short input input sample x(n)

short ref_output desired output value d(n)

short coeff[] filter coefficients h(n)

long no_coeffs number of coefficients

int res_shift right shift applied to output

short conv_fact convergence factor 2 µ

short workspace pointer to filter workspace

By examining the filter definition it should be clear to the reader that the only difference

between the two functions is that no_samples has been omitted. In applications where

Lms is used with no_samples set to one, Lms1 should be used instead. Lms1 has been

optimised for single sample operation, and will provide time savings over the use of

Lms. If more than one sample at a time requires sampling, Lms should be used since it

is quicker than using Lms1 in a for loop.

Before calling Lms1, the filter workspace should be initialised. This is achieved by

calling the function InitLms. This is the same routine as that used to initialise the Lms

filter workspace. The workspace is used by the filter to store past samples which are

required for computation. It is important that the user makes no attempt to access this

RAM at any time during filter execution in order to ensure filter accuracy and speed.

Because of the adaptive algorithm, Lms is a degree slower than Fir to operate. It is

therefore a possibility that the Lms routine can be used to generate filter coefficients

during development, and once finished can be inserted into a fixed FIR in order to speed

up implementation. Clearly if constant adaptive filtering is required this method is not

suitable.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

5.0 Convolution and Correlation

Introduction

DSPLib offers five functions in this section, two related to correlation and three related

to convolution.

This text offers the reader a brief introduction to the theory behind convolution and

correlation, before describing each function individually. The background theory is

intended only as an introduction, and a relevant DSP text should be consulted for any

additional information required.

The five functions offered by DSPLib are as follows :-

ConvComplete

ConvCyclic

ConvPartial

Correlate

CorrCyclic

The differences between each function are described in the routine descriptions section

of this report (5.2).

Applications of these functions include areas such as image processing where two

images are compared for similarity, and also range and distance finding in areas such as

sonar and radar.

5.1 Background Theory

5.1.1 Convolution

The simplest way to describe convolution is to state that convolution in the frequency

domain is equivalent to multiplication in the time domain. The equation for convolution

is given below :-

ym wixm i

Wres shift

() ()( ). _

=−











−−

∑

It should be noticed that this identical to the expression used to describe an FIR filter.

The operation of such a filter is that the input samples are convolved with the filter

coefficients and result in a filtered output. The convcomplete function will return bit

identical results to fir, when used in this manner.

When not used for filtering, the convolution function convolves the two input data sets

and returns a solution in y(m). The solution returned describes how an input signal

interacts with the system in order to generate a particular output.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

Relating correlation and convolution together is fairly simple. The convolution of two

data sets is identical to the cross-correlation of one data set with the time reversal of the

second data set relative to that used in convolution. Correlation is described below, and

is important in the understanding of convolution.

5.1.2 Correlation

Correlation essentially highlights similarities in data sets, and as a result finds

applications in areas such as robotic vision and range finders. The equation for

correlation is given below :-

ym wixi m

Wres shift

() ()( ). _











−−

∑

The two data sets w(i) and x(i) are correlated, and the output y(m) will increase in

amplitude when the two inputs exhibit similarities. Hence, in range finder applications,

the transmitted pulse is correlated with the received pulse. A peak will occur when the

transmitted data is received, and thus enables the distance to be computed.

Alternatively, the output can decrease in amplitude, indicating a negative correlation, i.e.

similar signals that are out of phase.

Correlation can be used to examine the properties of a given data set. If a set of data is

correlated against its self, the process is referred to as cross correlation. This method

can be used to remove noise from a periodic signal, and is shown by way of an example

later in this text.

The library offers a number of different variations of the above two functions.

Essentially, the same task is performed, and its only the range over which the

computation is computed that differs. The actual differences are described below in the

routine descriptions section (5.2), and also includes a number of examples written in C.

The theory behind convolution and correlation is beyond the scope of this text. The

descriptions and examples that follow should provide the reader with an outline of the

subject, and should help to allow the user to access the functions operation without

having to become involved with complex general mathematical descriptions often found

in text books. It should also help to highlight typical application areas, thus presenting

ideas for its use.

Although the example programs are shown written in C, it is recommended that the

optimised source code be used for the actual DSPLib function calls. The use of C

compiled code for DSP functions is not suitable, and will introduce a large time

overhead. All the optimised DSPLib functions are included in the file ensigma.lib. This

file should be included in the link command file, when linking the programs together.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

5.2 Routine Descriptions

5.2.1 ConvComplete

This function offers complete convolution between two data sets. The function is

essentially the same as Fir, and will return identical results given the same data. The

function is defined as follows :-

int ConvComplete(output, iw, ix, iw_size, ix_size,

res_shift)

where,

short output[] output data (size W+X-1)

short iw[] input w

short ix[] input x

long iw_size size of w (size W)

long ix_size size of x (size X)

int res_shift right shift applied to output

This function completely convolves the input data sets, and inserts zeros at the start and

end of the two arrays to enable this to take place.

The function is mathematically described as follows :-

ym wixm i

Wres shift

() ()( ). _

=−











−−

∑

1201

≤<+−

mWX

In order to ensure correct operation, a number of argument checks are performed, and

are as follows :-

iw_size < 1

ix_size < 1

res_shift < 0

res_shift > 27

The flag EDSP_BAD_ARG is returned, and the function aborted if a bad argument is

detected.

As is the case with most of the DSPLib examples, optimum performance will be

obtained if the program and data are located in on chip memory. Clearly the ability to

achieve this depends upon the Microcontroller used, and program size. It is

recommended that at the very least one of the input arrays be located in on chip RAM,

which will help to avoid bus conflicts.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

By way of an example of complete convolution, if the data set ‘1234’ is to be

completely convolved with the set ‘5678’, the following shifts would be performed :-

00001234000 ⇐ Held static

00000005678 ⇐ 1st convolution

00000056780 ⇐ 2nd convolution

00000567800 ⇐ 3rd convolution

00005678000 ⇐ 4th convolution

00056780000 ⇐ 5th convolution

00567800000 ⇐ 6th convolution

05678000000 ⇐ 7th convolution

Note how the insertion of zeros enable the two data sets to be completely convolved.

One data set is shifted relative to the other, and the convolution is taken.

This example also indicates how the size of the output array is larger than the input

arrays. In the case of ConvComplete, the output array size should be W+X-1. It is

essential that this is ensured since failure to do so will result in the corruption of the

stack.

5.2.2 ConvCyclic

ConvCyclic offers a function which cyclically convolves two data sets and returns the

result in a third array. The function is defined as follows :-

int ConvCyclic( output, iw, ix, size, res_shift)

Where,

short output[] output data

short iw[] input w

short ix[] input x

long size size of both input arrays (N)

int res_shift right shift applied to output

Because the two input arrays are cyclically convolved, they must be the same size. It is

up to the user to zero pad the smaller of the two arrays if their sizes are different. Cyclic

convolution is mathematically described as follows :-

ym wixm i NN

Nres shift

() ()( ). _

=−+











−−

∑

120

≤<

In order to ensure correct operation, a number of argument checks are performed, and

are as follows :-

size < 1

res_shift < 0

res_shift > 27

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

If any of the above are true, the function aborts and returns the flag EDSP_BAD_ARG.

As is the case with most of the DSPLib examples, optimum performance will be

obtained if the program and data are located in on chip memory. Clearly the ability to

achieve this depends upon the Microcontroller used, and program size. It is

recommended that at the very least one of the input arrays be located in on chip RAM,

which will help to avoid bus conflicts.

Clearly the three convolution functions offered by DSPLib perform the same

mathematical operation between two sets of data. The way in which they differ is in the

manipulation of arrays, and in the results that are included in the output. In the previous

example, the two arrays were completely convolved, and zero padded to ensue that all

results were obtained. Cyclic convolution is different to that, and is now shown in an

example.

Assume that the two arrays ‘1234’ and ‘5678’ are two be cyclically convolved. As

before, one of the arrays is held static, and the other shifted around it. After each shift,

the two arrays are convolved. This is performed until every element is convolved with

every other element. E.g.

1234 ⇐ Held Static

5678 ⇐ 1st convolution

8567 ⇐ 2nd convolution

7856 ⇐ 3rd convolution

6785 ⇐ 4th convolution

Hence it should become clear that the output array is the same size as the input arrays.

In the above example, 5678 was rotated four times in order to perform cyclic

convolution. This is shown in the diagram below, where the outer circle is rotated.

R o tate O ute r C irc le

Diagram 5.1 :- Cyclic Rotation

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

5.2.3 ConvPartial

ConvPartial offers partial convolution between two data sets. The function is essentially

the same as ConvComplete, but will not include results that are obtained outside the

data sets. The function is defined as follows :-

int ConvPartial( output, iw, ix, iw_size, ix_size,

res_shift)

Where,

short output[] output data (size X-W+1)

short iw[] input w

short ix[] input x

long iw_size size of w (size W)

long ix_size size of x (size X)

int res_shift right shift applied to output

The input arrays can be different siz es, however, the solution does not include elements

derived from outside the array boundaries.

The function is mathematically described as follows :-

ym wixm W i

Wres shift

() ()( ). _

=+−−











−−

∑

≤≤−

mXW

In order to ensure correct operation, a number of argument checks are performed, and

are as follows :-

iw_size < 1

ix_size < 1

ix_size < iw_size

res_shift < 0

res_shift > 27

If any of the above cases are true, the function will abort and return the flag

EDSP_BAD_ARG.

As is the case with most of the DSPLib examples, optimum performance will be

obtained if the program and data are located in on chip memory. Clearly the ability to

achieve this depends upon the Microcontroller used, and program size. It is

recommended that at the very least one of the input arrays be located in on chip RAM,

which will help to avoid bus conflicts.

When using the function it is important to pay attention to the array sizes in order to

avoid stack corruption.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

The inputs can be different sizes, but input w must be larger than input x. The size of

the output array is clearly essential, and is defined as X-W+1.

By way of an example, the partial convolution method is now explained. Following on

from previous examples in this section, we will convolve two data sets, one containing

1234 and the other 56789. Clearly, one array must be larger than the other in order to

satisfy the argument checking described earlier.

1234 ⇐ Held static

xxx5678

xx56789

x56789

56789 ⇐ 1st convolution

56789 ⇐ 2nd convolution

56789x

56789xx

56789xxx

As can be seen, the larger array is shifted across the smaller one. There are only two

instances when the convolutional result is made up from results obtained within both

arrays. The x’s represent areas where results would be calculated from outside array

boundaries, such as in ConvComplete, but in this case are rejected. Hence, in the above

example, the output array would contain two elements, which holds true to the equation

X-W+1 = 5-4+1 = 2.

5.2.4 Correlate

Correlate offers complete correlation between two arrays, and deposits the solution in a

third array. Convolution is the same as correlation, only one of the input arrays is in

reverse order, i.e. time reversed. The function is described as follows :-

int Correlate(output,iw,ix,iw_size,ix_size,no_corr,

res_shift)

Where,

short output[] output data (size X-W+1)

short iw[] input w

short ix[] input x

long iw_size size of w (size W)

long ix_size size of x (size X)

long no_corr number of correlations M to compute

int res_shift right shift applied to output

The user can control the number of correlations made. Correlation outside the array can

be performed, and zeros will automatically be inserted as required by the function.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

The function is mathematically described as follows :-

ym wixi m

Wres shift

() ()( ). _











−−

∑

120

≤<

In order to ensure correct function operation, a number of argument checks are

performed. The criteria for a bad argument is as follows :-

iw_size < 1

ix_size < 1

no_corr < 1

ix_size < iw_size

res_shift < 0

res_shift > 27

If one of the above is true, the function will abort and return the flag EDSP_BAD_ARG.

As is the case with most of the DSPLib examples, optimum performance will be

obtained if the program and data are located in on chip memory. Clearly the ability to

achieve this depends upon the Microcontroller used, and program size. It is

recommended that at the very least one of the input arrays be located in on chip RAM,

which will help to avoid bus conflicts.

The actual operation of Correlate is shown in an example at the end of this section (5.3).

However, in order to clarify the operation of shifting arrays relative to each other during

correlation calculation an example follows.

Assume that the data set 1234 is to be correlated with 56789. Clearly, by looking at the

argument checking, the second array must be larger than the first.

The process starts with both arrays in line, and then shifts one relative to the other the

user specified number of times.

1234 ⇐ Held static

56789 ⇐ 1st Correlation

56789 ⇐ 2nd Correlation

567890 ⇐ 3rd Correlation

5678900 ⇐ 4th Correlation

56789000 ⇐ 5th Correlation

Clearly the definition of the size of the output array requires care. The array will contain

one value for each correlation performed, and therefore should be no_corr elements

long. Special care should be exercised here since failure to allocate sufficient elements

in the output array will result in corruption of the stack due to the nature of the C

programming language.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

5.2.5 CorrCyclic

CorrCyclic offers the same functionality as Correlate, only the input arrays are shifted in

a cyclic manner relative to each other. The function is defined as follows :-

int CorrCyclic( output, iw, ix, size, res_shift)

Where,

short output[] output data

short iw[] input w

short ix[] input x

long size size of input arrays N

int res_shift right shift applied to output

The function requires two input arrays of the same size. The two inputs are cyclically

correlated and the solution returned in a third array called output. In certain

circumstances, the result can be false, in which case Correlate must be used. It is

important that the user has a good understanding of the correlation process before using

this function. CorrCyclic can be mathematically described as follows :-

ym wixi mN

Nres shift

() ()( ). _











−−

∑

120

≤<

In order to ensure that the function operates correctly, a number of argument checks are

performed. The argument checking criteria is as follows :-

size < 1

res_shift < 0

res_shift > 27

If any of the above are true, the function will abort and return the flag

EDSP_BAD_ARG. For the purpose of development, it is important to test for this flag.

However, once a working system is achieved, the argument checki ng can be removed in

order to speed up execution time.

As is the case with most of the DSPLib examples, optimum performance will be

obtained if the program and data are located in on chip memory. Clearly the ability to

achieve this depends upon the Microcontroller used, and program size. It is

recommended that at the very least one of the input arrays be located in on chip RAM,

which will help to avoid bus conflicts.

Since the two data sets are cyclically correlated, the output array should be the same size

as the two input arrays. In the example below, the shifts involved in correlating two

four element arrays are shown, and it is also shown that the output solution contains four

elements.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

The data set ‘1234’ is to be cyclically correlated with the data set ‘5678’. The shifts

involved in this process are as follows :-

1234 ⇐ Held static

5678 ⇐ 1st correlation

8567 ⇐ 2nd correlation

7856 ⇐ 3rd correlation

6785 ⇐ 4th correlation

The example above should show that the array containing the results is the same size as

the input arrays, in this case four elements. It is essential that the user defines an array

of suitable size to contain the correlation solutions. Failure to do this will result in the

corruption of the stack.

5.3 Correlation Example

By way of an example, this section shows how the correlation function can be used to

analyse the random properties of the noise generated by GenGWnoise. Sample C code

is given, as well as plots of output data.

Correlation can be used to determine how random a series of white noise samples are.

This test uses the DSPLib function GenGWnoise which is explained in the

miscellaneous section of this document. The function generates a series of white noise

samples which are claimed to be random.

The randomness of the white noise samples can be ascertained by performing auto

correlation on the array. This means that the same set of samples are loaded into both

input arrays in the Correlate function. The function then performs correlation, and

returns a solution in a third array. The plot of this solution can be seen in diagram 5.2

below.

The correlation process will highlight likeness between the input arrays. In our case, if

the data is truly random, the level of correlation will be very low. The reason for this is

that with random noise, each sample is has an equal probability of being positive or

negative. Most samples will therefore cancel each other out, and will thus return a low

correlation. The one exception to this is the first correlation, i.e. the first element of the

output array. At this instance, each element of the input array is being correlated to

itself since no shifting has yet taken place. This means therefore that a very high level

of correlation will take place, and a large value returned to the first element. Once the

data is shifted by one place, the level of correlation will become very low. This can be

seen in Diagram 5.2 over the page, and serves to prove the random properties of the

data.

The white noise was generated by GenGWnoise, and is discussed in the next chapter.

DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation

Hitachi Micro Systems Europe Ltd

Au tocorrelat ion Of Gaussian Whit e Noise

-5000

5000

10000

15000

20000

25000

30000

1 16 31 46 61 76 91 106 121 136 151 166 181 196

Series1

Diagram 5.2

The above plot confirms the random properties of the white noise generated by

GenGWnoise. It is clear to see the large amount of correlation when the input arrays

were aligned. Once the data was shifted, the correlation remained low indicating the

lack of periodicity in the white noise.

C Code Implimentation

#include <stdio.h>

#include <math.h>

#include <ensigma.h>

#define NSAM 128

void main(void)

{/*** Define Local Variables ***/

short noisea[NSAM], noiseb[NSAM], result[NSAM];

int res_shift, a;

float varience;

varience = 32767;

res_shift = 15;

/*** Generate White Noise ***/

if(GenGWnoise(noisea, NSAM, varience) != EDSP_OK)

printf(“Cannot Generate Noise”);

/*** Copy Arrays For Auto Correlation ***/

for(a = 0; a < NSAM; a++)

noiseb[a] = noisea[a];

/*** Perform Auto Correlation ***/

if(Correlate(result, noisea, noiseb, NSAM, NSAM, NSAM, res_shift)!=

EDSP_OK)

printf(“Problem With Correlate”);

}

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

6.0 Miscellaneous Functions

Introduction

This section offers the user a number of useful functions which may be used to ‘glue’

certain DSP functions together. All the functions are unrelated, and perform small tasks

which the user would be expected to require when implementing a number of DSP

operations on an SH Microcontroller.

Because of the diverse nature of the functions, software examples have not been

provided. It is however hoped that all the functions are reasonably self explanatory, and

should cause no problems.

6.1.1 GenGWnoise

This function offers a method of generating true Guassian white noise, of a length

specified by the user. The noise generated has zero mean, and a user specified variance.

The method of generation is described in detail in the related Ensigma text, but

essentially uses the rand operator in C, and arranges pairs of samples to a certain

criteria.

The function is defined as follows :-

int GenGWnoise( output, no_samples, variance)

Where,

short output[] array containing white noise samples

long no_samples number of output samples required

float variance user defined variance of noise distribution

In order for the function to operate correctly, the user must define no_samples to be one

or greater, and also set the variance to a value greater than 0.0. Failure to ensure this

will result in the function aborting and returning the flag EDSP_BAD_ARG.

The equations to determine the white noise samples are as follows :-

or xx

=−

ln( ) /

or xx

=−

ln( ) /

The variables r1 and r2 represent two numbers generated by the random number

generator

rand, between -1 and 1. These two random numbers are then squared and

added together. If the sum is less than one, then they will be used in the above

equations. If the sum is greater than one, another pair of random numbers will be

generated, and tested in the same manner. This process continues until the user

specified number of noise samples have been generated.

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

Samples are generated in pairs by the equations shown above. If an odd number of

samples are required, the second of the two solutions is discarded. It should also be

noted that this function isn’t suitable for real time operation due to the floating point

arithmetic employed.

6.1.2 MatrixMult

MatrixMult offers an optimised method of performing matrix multiplication. Two

matrices are multiplied together and the result deposited in a third. The output matrix

should not be ‘in-place’, i.e. should not conflict with either of the input matrices.

The function is defined as follows :-

int MatrixMult(op_matrix, ip_matrix1, ip_matrix2,

no_rows1, no_cols1, no_cols2, res_shift)

Where,

void *op_matrix pointer to first element of output matrix

void *ip_matrix1 pointer to first element of input matrix1

void *ip_matrix2 pointer to first element of input matrix2

long no_rows1 row dimension of matrix1 (m)

long no_cols1 column dimension of matrix1 &

row dimension of matrix2 (n)

long no_cols2 column dimension of matrix2 (p)

int res_shift right shift applied to each output

The function will fail if any of the dimensions m,n or p are less than one. Also the

stated shift applied to the output matrix elements must be greater than zero, and less

than twenty eight. If the function fails the flag EDSP_BAD_ARG is returned.

Care must be exercised when specifying array size to contain the array solution. It

should be noted that void pointers are used to point to the first element of the matrices.

When multiplying two matrices, it is essential that they are conformable. That is the

column of the first matrix must be the same size as the row of the second. The

dimension of the resulting matrix will be m x p.

Matrices are stored in t he standard C manner, in one array. The elements are ordered as

follows :-

aaa

012

345

678













would be represented by the array :-

{}

aaaaaaaaa

012345678

,,,,,,,,

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

In order to ensure maximum efficiency, the user should ensure that matrix1 is located in

on chip RAM. Clearly the ability to do this depends upon the SH device being used, as

well as other memory allocation considerations. Clearly the function will operate

correctly with both matrices located off chip, but bus conflicts may arise which will

slow down the execution time.

6.1.3 VectorMult

This function offers an optimised routine for multiplying two vectors together. The

result of the multiplication is stored in a third array of the same size as the inputs. The

output array should be located in a different memory location to both of the inputs, i.e. it

should not be in-place. A typical application of this function is to apply a window to a

set of data.

The function is defined as follows :-

int VectorMult(output,ip1,ip2,no_elements,res_shift)

Where,

short output[] array containing output

short ip1[] input array 1

short ip2[] input array 2

long no_elements number of elements in each array

int res_shift right shift applied to each output

The function will abort if the user specifies the number of elements to be less than one.

It will also abort if the right hand shift applied is less than zero or greater than sixteen.

If the function aborts, the flag EDSP_BAD_ARG will be returned.

It is important that the two input arrays used with this function are the same size, and

that the no_elements argument is set to this size, or less. Clearly, if the user doesn’t

want to multiply all the array elements together, the no_elements argument is set to a

value less than the input array size.

Vector mult can be represented by the following C code. Clearly, if the user wishes to

use this type of process, VectorMult should be used since it offers an optimised version

compared to that generated by most C compilers. This will be of particular importance

in real time applications where execution time is of great importance.

short output[NSAM], ip1[NSAM], ip2[NSAM];

long NSAM = 128;

int res_shift = 0, a;

for(a=0; a < NSAM; a++)

output[a] = (ip1[a] * ip2[a]);

As can be seen, the function performs element wise multiplication. I.e. it multiplies the

first element of ip1 by the first element of ip2 and so on.

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

If the user requires the calculation of the dot product of two arrays (every element

multiplied by every element), the matrix mult should be used with n set to one. For

more details on MatrixMult the reader should consult section 6.1.2.

6.1.4 MsPower

MsPower offers a function which will simply calculate the mean square power of an

input array presented to it. The result is returned as a long (32-bits) in a single element

pointed to by a pointer defined by the user.

The function is defined as follows :-

int MsPower( output, input, no_elements)

Where,

long *output result

short input[] input array

long no_elements number of elements contained in input array

The function performs simple argument checking, and will return EDSP_BAD_ARG if

no_elements is less than one.

The function is mathematically described as follows :-

Mean Square Power ==

−

∑

Nxi

N()

As can be seen the function simply returns the mean of the sum of the squares of the

elements in the input array. The result of the division is rounded to the nearest integral

value. There is a risk of overflow if the number of el ements specified is more th an 232-

6.1.5 Mean

This function follows directly from that described above. The function calculates the

mean of a set of data presented to it in an array. The single element solution is a short

and is pointed to by a pointer defined by the user.

The function is defined as follows :-

int Mean( mean, input, no_elements)

Where,

short *mean mean of input data x

short input[] input array x

long no_elements number of elements N in x to process

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

The function performs basic argument checking, and will return the flag

EDSP_BAD_ARG if the number of elements N specified is less than one.

The function is mathematically described as follows :-

xNxi

−

∑

1()

The result of the division is rounded to the nearest integral. If the user specifies a

number of elements that is larger than 216-1 then the possibility of overflow exists.

6.1.6 Variance

This function offers the user a routine to calculate the variance of a set of input data.

The user presents the function with an array containing the data to be analysed, and the

function will then return both the mean of that data and also the variance.

The function is defined as follows :-

int Variance( variance, mean, input, no_elements)

Where,

long *variance variance of the input array

short *mean mean of the input array x

short input[] input data x

long no_elements number of elements N in input array

Basic argument checking is performed, and the function will abort and return the flag

EDSP_BAD_ARG if the number of elements specified is less than one.

Mathematically, the function performs two tasks. Firstly it calculates the mean of the

data, and makes this solution available to the user. Secondly, the variance is calculated

using the solution to mean calculated previous. The equations for mean and variance

are as follows :-

mean x Nxi

== =

−

∑

1()

var ( ( ) )

ience Nxi x

== −

−

∑

In each case, the results of the divisions are rounded to the nearest integral value. The

result of mean is returned as a short which is 16-bit. Variance solution is returned as a

long 32-bit value. There is a possibility of mean overflowing if the number of elements

is greater than 216-1. The variance calculation is not checked for overflow.

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

6.1.7 MaxI

This function will locate the maximum value in an array presented to it, and will then

return a pointer to that value. It does not return the value itself. Maxl is defined as

follows :-

int Maxl( max_ptr, input, no_elements)

Where,

short **max_ptr address of pointer to max element

short input[] input array to be searched

long no_elements number of elements in input to be searched

The function will abort and return the flag EDSP_BAD_ARG if the user specifies less

than one element in the input array. The address of the element with the maximum

value is returned in max_ptr. If two elements or more have the same maximum value,

the address of the element nearest to the start of the array will be returned.

6.1.8 MinI

Minl compliments the above function in that it returns the address of the lowest element

in the input array presented to the function. The actual minimum value is not returned,

but instead a pointer to it. The function is defined as follows :-

int Minl( min_ptr, input, no_elements)

Where,

short **min_ptr address of pointer to min value

short input[] input array to be searched

long no_elements number of elements to be processed

The function will return the flag EDSP_BAD_ARG if the user specifies a number of

elements that is less than one.

As with the above function Maxl, if there are two or more min elements the function

will return a pointer to the element that is closest to the start of the array.

Timings for these functions are impossible to quote since the total execution time

depends upon the number of elements in the array being searched. One method of

further improving the execution speed is to remove the argument checking, as long as

the number of elements in the input array is greater than one, this will not cause and

problem, nor compromise the reliability of the function.

DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions

Hitachi Micro Systems Europe Ltd

It should be noted that removal of argument checking is only recommended on simple

functions such as this one. For more complicated functions, the argument checking will

provide an effective tool for debugging processes.

6.1.9 Peakl

This final function is simply a variation on Maxl. The difference with this function is

that it will return the address of the maximum absolute value in the input array. I.e. it

looks for the largest value irrespective of sign. As an example in an array of two

elements {30,-50} Peakl would return a pointer to the -50 element.

The function is defined as follows :-

int Peakl( peak_ptr, input, no_elements)

Where,

short **peak_ptr address of pointer to the peak element

short input[] input array to be searched

long no_elements number of elements in input to be processed

The function performs basic argument checking and will return the flag

EDSP_BAD_ARG i f the user speci fies th e number of elem ents to be l ess t han one. The

process can be speeded up by removing the argument checking, but the reader should

first read the note at the base of the previous page.

As with the previous two functions, if there are two elements or more that have the same

absolute peak value, the function will return the address of the element that is closest to

the start of the array.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 79

7.0 DSPLib Implementation

Introduction

This final section of the application notes sets out to give a typical example of how a

working system based around the SH1 processor can be implemented. By the very

nature of DSP, it is very unlikely that this example will exactly suit the needs of the

reader due to the great diversity of digital signal processing. However, the aim of this

section is to give the reader some ideas about hardware set-up, as well as programming

around the functions offered by DSPLib. Furthermore, it is hopped that consultation of

this section will speed up the design process, and help to avert possible time consuming

problems.

Implementation can be split into two distinct areas, hardware and software. The

hardware design considerations are considered first, followed by some typical software

examples to tie in. Clearly, any software examples given will be fairly hardware

dependant, but should help the reader to understand the process of interfacing DSPLib

with the real world.

7.1 Hardware

The hardware implementation of a DSP process is greatly dependant upon the process it

is intended for. Having said that, every DSP system can be broken down and shown to

contain at least two if not all of these three distinct sections :-

Analogue To Digital Conversion

Digital Processing

Digital To Analogue Conversion

A system containing two of the above sections is the compact disc player for example.

The data is already in digital format, and so requires no analogue to digital conversion.

Conversely, a mobile telephone contains all three, with analogue data coming from the

microphone, and also being sent to the speaker.

The hardware example given in the following text contains all three sections, and is

capable of performing digital processing on real analogue signals, and returning the

solutions as real analogue signals.

It is important for the reader to remember that the data conversion processes are not

simply digital to analogue or analogue to digital converters, but instead require the

careful consideration of the use of sampling rates, and also filtering. In the data

conversion process, the use of anti-aliasing and anti-imaging filters are required. Since

it is the purpose of this document to show how the Hitachi SH-1 processor can be

interfaced, notes on filter design are not included.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 80

The filters used in a typical application take the form of a low pass network with a

Butterworth response. The filter order chosen depends upon the acceptable level of

aliasing/imaging but would typically be around 6th order implementations.

7.1.1 Analogue to Digital Conversion

This process consists of three distinct blocks, and is shown below. The real world

analogue signal is firstly bandlimited by the anti-aliasing filter. The purpose of this

filter is to prevent signals that are near or above the Nyquist frequency (twice the

sampling frequency) passing into the sampling device. As discussed above, the filter

would normally take the form of a low pass network with Butterworth response.

The band limited signal is passed into the analogue to digital converter. In the case of

this example, the on chip ADC has been used, which produces a ten bit representation of

the analogue data. The ADC is controlled by a sample rate generator. The sample rate

dictates the ADC conversion process, and should be chosen to be at least twice the

highest frequency component present in the input.

In this particular example, the bandwidth of the input signal is 3.4Khz, and the sampling

rate was chosen to be 8Khz. This system should in part help to mimic telephone based

systems which have similar specs.

There are many methods open to the user for sample rate generation. In order to gain

maximum efficiency, it is suggested that the sample rate be synchronised to the CPU

clock. For the purposes of this example, the CPU clock has been divided by external

hardware dividers, eventually arriving at a frequency of 7.8Khz, which is an acceptable

rate for the sampling of the given signal bandwidth.

Anti-Aliasing

Filter (6th Order)

Sample Rate

Generator

Analogue To Digital

Converter

Analogue

Input

CPU

Clock

Band Limited

Analogue Signal

10-bit

Data Out

Diagram 7.1 :- Analogue Interface

As mentioned earlier, the Hitachi SH-1 processor contains an on chip ADC. This device

gives ten bit resolution, and helps to simplify the implementation of a real time system.

Clearly if the user requires a higher resolution, an external ADC should be used. The

device could be memory mapped, and accessed in the same manner as RAM. Notes on

using the internal ADC are included in the software section of this document. For

further information on the SH-1 peripherals, the reader should consult the

SH7032/SH7034 hardware manual.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 81

Once the data is converted into digital form, it is suitable to present to the DSPLib

routines. The actual interface software used to lo ad data in, and also write data out at a

rate of 8Khz is described in the software section.

7.1.2 Digital To Analogue Conversion

Once the digital processing is complete, it is necessary to convert the descrete time

domain samples returned by DSPLib into a continuous analogue waveform. This

process involves two stages, and is rather obviously the reverse to the analogue

conversion process described above.

Because the SH-1 series of microcontrollers don’t have on chip DACs, the user must

interface a device to it. The actual specific details of how this can be achieved are

included in the circuit description section of this report.

Keeping to a more general form, the best location for a DAC on the SH series of devices

is on the data bus. By locating the DAC on the memory map, the device can be

accessed exactly like a piece of RAM. The access time to the DAC can be controlled by

the insertion of wait states, and these are discussed fully in the software description

section at the end of this report.

The use of serial data is very popular in many DSP systems today. The SH-1 has two

bi-directional serial ports which could be used to interface to serial DACs as well as

ADCs. It should however be noted that the respective converters would most probably

not interface directly to the serial port, and would in fact require some ‘glue’ logic. The

main reason for this is that the serial protocol used by the SH series would probably not

be suitable for the converter being connected. The use of some ‘glue’ logic would

enable the associated start and stop bits to be removed/modified inorder to satisfy the

chosen serial device.

When converting data from digital to analogue, it is essential that the resulting

waveform is filtered in order to remove most imaging frequencies. The imaging

frequencies are generated as a result of the steps that are produced by the DAC. These

steps are referred to as quantisation, and reflect the accuracy of the converter. As the

converter can only output one of say 1024 levels (10-bit), the waveform is stepped in

between. The sharp edges of the steps create many harmonics which roll off with a

sinex/x function in the frequency domain.

The purpose of the anti-imaging filter is to allow frequencies only in the bandwidth of

interest to be passed, thus stripping the signal of the unwanted quantisation steps. It is

specific to the user to specify the response of the filter. The filter is normally low pass,

and exhibit s Butterworth charact eristics. In the case of the ex ample system described in

this section, a 6th order filter was chosen, with a 3dB bandwidth of 3.4Khz.

For both the ADC and DAC process, use was made of a virtual ground. The purpose of

this process is to make the signal presented to the ADC purely positive. The DC offset

introduced was then removed by software.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 82

Once the DSP operation is complete, the DC offset is re-introduced, and the purely

positive output signal then sent to the DAC. The output of the DAC is de-coupled in

order to remove the signals DC content. By using this method, a single quadrant

converter can be used at both ends, thus helping to keep system cost to a minimum at

the expense of only a few clock cycles.

7.1.3 Circuit Description

The reader should consult the enclosed schematic of the example system hardware when

reading the following description, which can be found at the end of this section.

The input signal is applied on the ADC_In line on the left of the schematic. This signal

has been band limited to 0 - 3.4Khz in order to prevent aliasing occurring.

Once applied, the signal is de-coupled by C1 which has the effect of removing any DC

component present in the signal. Next the input is passed through a unity gain buffer.

This stage is included to isolate the filter stage from the ADC process. It is important to

ensure that the final stage of the anti-aliasing filter is not loaded in any way, which

could in turn cause the filter response to be altered. By using a buffer, the final stage of

the filter will ‘see’ a high impedance, thus removing any risk of loading. The output of

the buffer is fed back to the inverting input, which forces the amplifier to act as a

voltage follower. Once buffered, the signal is once again decoupled by C2, which

removes any DC introduced by the amplifier.

The next stage concerns the introduction of a 2.5 volt DC offset or ‘virtual ground’. The

purpose of this process is to ensure that the signal applied to the ADC is purely positive.

Up to this point, the input signal has been around five volts peak to peak, centred around

zero. Now, a DC offset is added to the signal, which centres the signal around 2.5 volts,

shifting the whole signal positive. This will now ensure that the whole signal is

sampled with no loss of information. After sampling, the DC offset is removed using

software techniques. This process is described in the software section of this document.

The DC offset is generated by a dedicated virtual ground generator device U8. There

are a number of these devices on the market, and any three pin type should be suitable.

It is important to remember that the reference should be as noise free as possible, as

noise here is introduced into the signal. The use of a bandgap type generator should

ensure low noise operation. It is important that a zener device is not used as these are

very noisy and will degrade the systems signal to noise ratio.

Prior to application to the ADC, the signal is passed through an over voltage protection

circuit provided by D1 and D2. If the input is between 0 and 5 volts, the diodes have no

effect on the signal. However, if the voltage should rise above the supply rail, the diode

will have the effect of clamping the signal to the supply. This is also true if the signal

should fall lower than zero, diode D1 will clamp the input to ground. This arrangement

is included to offer some protection over the SH-1 on chip ADC.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 83

After the input protection, the signal is in suitable form to be applied to the SH-1 ADC.

This ADC has eight multiplexed inputs, and in the case of this example, input AN0 has

been chosen. For more information on the SH-1 hardware configuration, including the

ADC, the reader should consult the SH7032/SH7034 Hardware Manual.

The SH-1 ADC can be configured such that it is triggered externally, as is the case in

this example. The trigger is applied to the ADTRG line located on pin 63. The trigger

is a square wave with 50% duty cycle, and conversion starts on each falling edge. The

trigger signal is derived from the 16Mhz CPU clock which is available from the CK line

on pin 71. This output is derived from the system crystal, and has been cleaned up and

passed through a duty cycle correction circuit.

The 16Mhz signal is repeatedly divided by three four bit ripple counters U4, U5 and U6.

The end effect is 16Mhz / 2048 = 7.8Khz. The resultant signal from U6 is used to start

the ADC. An alternative method for sample rate generation is to program the SH-1

timing pattern generator. Details on this subject may be found in the SH-1 Hardware

Manual.

The final section of hardware to explain is the digital to analogue conversion. This is

achieved by way of a memory mapped 12-bit device U2. Although a twelve bit device

has been chosen for this example, the output resolution is only really 10-bit due to the

10-bit ADC used. If the reader wanted a 16-bit system, the method of connection is

identical, only the device would use the first sixteen data lines on the bus instead of the

first twelve in this case.

Because the device is memory mapped, a small amount of address decoding logic is

required. The SH-1 has most of the address decoding provided on chip, and the memory

map for the device can be seen in its associated hardware manual. In the case of this

example, the DAC is located in area 6 which starts at address E000000 and ends at

EFFFFFF. When an address within that area is accessed, the lin e CS6 falls low. In the

case of a write operation, which is what happens with the DAC, the WR pin falls low as

well. U3 controls the chip select on the DAC, by ensuring that the CS pin on the DAC

falls low when both the CS6 line and the WR line are low. By the nature of the real

time operation of the system, the CS pin on the DAC should fall low at the same rate as

the ADTRG pin, in this case 7.8Khz. The width of the write cycle can be controlled by

the SH-1 wait state controller. Correct programming of this is important to ensure that

the write cycle is long enough to be seen by the DAC. In the case of the AD668 device

shown, a write cycle of length 125nS was used, which equates to two CPU clock cycles.

The DAC used in this example is an Analog Devices AD668. This device is suited to

direct connection to CPU buses due to its data latch arrangement. It is important that

the power supply arrangements are noted, a split rail supply of 16 volts is required. For

more details on this device, the appropriate data sheet should be consulted.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 84

The output of the DAC is 0 - 5volts and will contain a large amount of quantisation

noise. The removal of the DC offset can be achieved by de-coupling the signal, before

passing it to a suitable Butterworth low pass filter arrangement.

7.2 Software

This section describes the relatively straight forward software routines required to pass

real time data through the system outlined in the earlier hardware description.

The routines are specific to the hardware configuration detailed, but are written such

that modification for any application should be quite simple.

The main program is written in ANSI C, and in turn calls a number of functions. For

optimum performance, these functions have been written in assembler. The assembler

format shown in the examples below is SH series, and will work with the relevant

Hitachi tools. It should be noted that the code will not assemble with the GNU tools

that are available for the SH series. In order to achieve this , the format of the code will

have to be changed as appropriate.

7.2.1 Main()

This is the main program which as described calls the other routines in turn. The code

can be seen below.

#include <stdio.h>

#include <math.h>

#include "coeff.dat"

#include <ensigdsp.h>

#define NSAM 1

#define no_sections 1

void system(void);

void dac(short*);

void adc(short*);

void detect(short*);

void main(void)

{short *pointer;

short sample; /* Data From ADC */

short output; /* Data For DAC */

short *result; /* Start DSP operation Flag */

short *work;

int end = 0;

pointer = &sample;

result = &output;

system(); /* Initialise System (Wait State = 1)*/

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 85

/** Filter workspace **/

if(InitIir(&work, no_sections) != EDSP_OK)

printf("Problems .....");

/** Set Up ADC And DMAC **/

adc(pointer);

/** Real Time Loop **/

{

detect(pointer); /** Wait For End Of DMA **/

Iir1(result,sample,coeff,no_sections,work)

dac(result); /*Output Data To Memory Mapped DAC */

}

while(end == 0);

}

Following the above C code, the operation is as follows :-

The code implements an IIR filter using the function Iir1 from DSPLib. The first part of

the code is the include statements, and as well as the expected header files, the ensidsp.h

is included. This header file contains all the function definitions for DSPLib, as well as

some frequently used constants. At this point, the coefficients for the IIR filter are

included as well in the file coeff.dat.

After some function prototypes and constant declarations, the code starts proper. The

first function call is to system(). This function is not part of DSPLib, but is instead a

small section of code which sets up the hardware on the SH-1. This function is

described in detail in section 7.2.2.

Following on from system(), the IIR filter workspace is initialised. This function is

described in section 4.2.3. In order for this function to operate correctly, it is essential

that the memory sections are defined properly in the linker command file. For further

information on the Hitachi linker, the reader should consult the H series linkage editor

users manual.

Next in the code, the on-chip ADC is initialised. This function adc() is described in

section 7.2.3, and requires a pointer which is used to point to the sampled value. The

function makes use of the DMA (Direct Memory Access) controller. When the ADC

has completed its conversion, the result is moved into a memory location pointed to by

the pointer provided by the user in one clock cycle.

Now we enter the real time loop. Until now, the code has been set-up, and only

executed once. Inside this loop, the processor performs sample by sample computation.

The first function we meet is called detect(). This function forces the processor to wait

for the end of the DMA operation before continuing the operation.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 86

This ensures that the DSP operation is only performed on a new sample. The function is

described in section 7.2.4.

Once the end of the DMA operation has been detected, the DMA controller is reset, and

the DSP operat ion can take pl ace on the si ngle sample received. In order t o achieve real

time operation, it is essential that the DSP task ends before a new sample arrives.

When the DSP task is complete, the function dac() is used to output the result to the

memory mapped DAC described in the hardware section of this document. The

function dac() is described in detail in section 7.2.5.

7.2.2 system()

The H series assembler for this function is as follows :-

_system:

;Initalise Wait States For DAC

;Set Up Pin Function Controller

MOV.L PACR1,R3

MOV.L SETUP,R2 ;Enable ADTRG Pin

RTS

MOV.W R2,@R3

MOV.L WCR3,R3

MOV.L WAIT,R2 ;One Wait State

RTS

MOV.W R2,@R3

.ALIGN 4

WCR3 .data.l H'05FFFFA6

PACR1 .data.l H'05FFFFC8

SETUP .data.l H'0000330A

WAIT .data.l H'00008800

.END

This function performs two tasks. The first task is to set up the pin function controller

on the SH-1 such that the ADTRG pin is selected. This will allow the on chip ADC to

be externally triggered via that pin. This process is achieved by setting up the PACR1

Finally, the function sets up the wait state controller such that the write cycle to the

external DAC is of a sufficient period. In this particular example, the write cycle is two

CPU clock cycles long. The register WCR3 was used to control this task. Details on

both the registers mentioned here can be found in the SH-1 hardware manual.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 87

7.2.3 adc(short*)

The SH assembler listing for this function is as follows :-

_adc:

;Set Up DMAC

MOV.L SAR0,R1

MOV.L ADDRA,R2

MOV.L R2,@R1

MOV.L DAR0,R1

MOV R4,R2

MOV.L R2,@R1

MOV.L TCR0,R1

MOV #H'01,R2

MOV.W R2,@R1

MOV.L CHCR0,R1

MOV.L DATA,R2

MOV.w R2,@R1

;Start DMAC

MOV.L DMAOR,R1

MOV #H'01,R2

MOV.W R2,@R1

;Set Up ADC

MOV.L ADCSR,R1

MOV #H'048,R2

MOV.B R2,@R1

MOV.L ADCR,R1

MOV #H'0FF,R2

RTS

MOV.B R2,@R1

;Register Vectors

.ALIGN 4

DATA .data.l H'0000D09

ADDRA .data.l H'05FFFEE0

ADCSR .data.l H'05FFFEE8

ADCR .data.l H'05FFFEE9

SAR0 .data.l H'05FFFF40

DAR0 .data.l H'05FFFF44

TCR0 .data.l H'05FFFF4A

CHCR0 .data.l H'05FFFF4E

DMAOR .data.l H'05FFFF48

.END

This function performs several tasks. When calling the function, the user provides a

pointer to a variable. This pointer is used by the function to set up the DMA controller

such that the ADC value is transferred to it in one clock cycle. The setting of the DMA

is quite intricate.

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 88

It is therefore recommended that the reader consult the SH-1 hardware manual DMA

section. The registers seen above can be followed, and set up to suit a particular

application.

When the DMA set up is complete, the DMA controller is started. This now means that

when the ADC generates an interrupt at the end of a conversion, the DMA will transfer

the data to the destination address in one clock cycle.

The final part of this code sets up the internal ADC. The options chosen include an

external trigger option and also an interrupt generation which in turn starts the DMA

controller. From now onwards in this code, the ADC will always start converting when

the external trigger pin falls low. After each DMA operation, some registers need

resetting. This task is performed by the detect() function described in the next section.

7.2.4 detect(short*)

The detect routine forces the process to wait until the input data is valid, i.e. a new

sample. The assembly listing of this function is as follows :-

_detect:

CLRT

MOV.L CHCR0,R1

LOOP: MOV.W @R1,R2

EXTU.W R2,R0

TST #H'02,R0

;Test bit goes to one if AND is zero

BT LOOP

;Convert Sampled Data pointed to by R4

MOV.W @R4,R1

EXTU.W R1,R2

SHLR R2

ADD #H'03,R2

MOV.L DATA,R1

SUB R1,R2

MOV.W R2,@R4

;Start DMA And ADC Again

;Reset TCR0 & CHCR0 & DMAOR

MOV.L CHCR0,R1

MOV.L DATA1,R2

MOV.W R2,@R1

MOV.L TCR0,R1

MOV #H'01,R2

MOV.W R2,@R1

MOV.L DMAOR,R1

MOV #H'01,R2

MOV.W R2,@R1

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 89

;Reset ADCSR

MOV.L ADCSR,R1

MOV #H'048,R2

RTS

MOV.B R2,@R1

.ALIGN 4

DAC .data.l H'0E000000

TCR0 .data.l H'05FFFF4A

CHCR0 .data.l H'05FFFF4E

DMAOR .data.l H'05FFFF48

DATA1 .data.l H'00000D09

ADCSR .data.l H'05FFFEE8

DATA .data.l H'000008A3

.END

The first part of this code consists of a loop which tests for the end of a DMAC

operation. The program will stay in that loop until the DMA has finished, then come

out and execute the remaining code. The end of a DMA means that a new sample is

ready for processing. By making the CPU wait for a new sample, the risk of processing

the same sample twice is removed.

Next, the new sample is converted into the correct form ready for presentation to the

DSP task. The sample from the on chip ADC sits in the upper 10 bits of a sixteen bit

word. The ten bits are therefore shi fted right by four places and the t wo least si gnificant

bits set to one. The result of this process is that the data is now in 12 bit form suitable

for the DAC. Next, the DC offset introduced by the hardware is removed by a single

cycle subtraction. If the chosen DSP task is operating correctly, it should return a 12 bit

solution which can have an offset applied to it, and then sent to the DAC. That process

is described in section 7.2.5.

Once the data is in suitable form, the DMA controller is reset and started again. The

transfer count register is reset, and the controller is started again. This means that a

DMA operation can take place during the DSP task, and the new sample is ready for

processing immediately.

Finally, the ADC is started again by writing to the ADCSR register. For full details on

the setting and controlling of SH-1 peripherals, the SH-1 hardware manual should be

consulted.

7.2.5 dac(short*)

This final routine simply takes the solution returned by the DSP process, and writes it to

the memory mapped DAC. The SH assembly listing for this function is as follows :-

DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation

Hitachi Micro Systems Europe Ltd 90

_dac:

;Output Value To DAC

MOV.L DAC,R3

MOV.W @R4,R2

MOV.L OFFSET,R1

ADD R1,R2

RTS

MOV.W R2,@R3

.ALIGN 4

DAC .data.l H'0E000000

OFFSET .data.l H'00000800

.END

As can be seen, the DAC is located at address E000000. This location is in area 6, and

ties in with the hardware description earlier. The value returned by the DSP process has

a DC offset added to it. The offset is hex 800, which corresponds to half of the twelve

bit output. Therefore, with this offset, the maximum output swing is achievable. Half

of full scale corresponds to 2.5volts, which as the hardware section will confirm, is the

same as the offset applied to the input.

Once the offset is applied the value is written to the DAC as if it were a piece of

memory. The write process has already been set-up in the system() function described

in 7.2.2. When the value has been written, the program counter returns to detect() and

waits until a valid sample is in memory, and then the process starts again.

7.3 Timing

When writing code for a real time process it is essential that the operation stays within

the allotted time space. For the system described in this section, the processing time can

be calculated as follows.

CPU = 16MHz

Sample Rate = 7.8 KHz

Between each sample there is 128uS. This time corresponds to 2048 CPU clock cycles.

From this total, the IO operations steal cycles as does the conversion of data.

For optimal execution speed DO NOT compile the ANSI C functions included in

DSPLib. These are included for reference only, and will not lead to an efficient

implementation of a given DSP task. The reader should use the optimised source code

in the optsrc directory in DSPLib, or alternatively simply use the ensigma.lib during the

link operation, which is in turn built from the optimised code.

DSPLib For SH Application Note (19-031/1.0) Appendix One

Hitachi Micro Systems Europe Ltd 92

Appendix One

Ensigma Software Guide For DSPLib

DSPLib for SH

User Guide

E. Andrews

Sept 15, 1995



Ensigma Ltd 1995



Hitachi Micro Systems Europe Ltd 1995

Page2of87

Disclaimer

When using this do cument, keep the following in mind,

1. This do cumentmay, wholly or partially, b e sub ject to change without

notice.

2. All rights are reserved: No one is permitted to repro duce or duplicate,

in any form, the whole or part of this do cument without Hitachi's

permission.

3. Hitachi will not be held responsible for any damage to the user that

may result from accidents or any other reasons during the operation of

the user's unit according to this do cument.

4. Circuitry and other examples described herein are meant only to in-

dicate the characteristics and p erformance of Hitachi's semiconductor

products. Hitachi assumes no resp onsibility for anyintellectual prop-

erty claims or other problems that may result from applications based

on the examples therein.

5. No license is granted by implication or otherwise under any patents or

other rights of any third party or Hitachi, Ltd.

6. MEDICAL APPLICATIONS: Hitachi's products are not authorised for

use in MEDICAL APPLICATIONS without the written consentofthe

appropriate ocer of Hitachi's sales company. Such use includes, but is

not limited to, use in life supp ort systems. Buyers of Hitachi's pro ducts

are requested to notify the relevant sales oce when planning to use

the products in MEDICAL APPLICATIONS.

Page3of87

Contents

1 Intro duction 6

2 Installation 8

3 Data Formats 10

4 ANSI C Library 11

5 Eciency 12

6 Test and Example Programs 13

7 Fast Fourier Transforms 15

FftComplex

::: :: :: ::: :: :: :: :: ::: :: :: ::

FftReal

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

ItComplex

::: :: :: ::: :: :: :: :: ::: :: :: ::

ItReal

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

FftInComplex

:::: :: ::: :: :: :: :: ::: :: :: ::

FftInReal

:::: :: :: ::: :: :: :: :: ::: :: :: ::

ItInComplex

:::: :: ::: :: :: :: :: ::: :: :: ::

ItInReal

:::: :: :: ::: :: :: :: :: ::: :: :: ::

LogMagnitude

:::: :: ::: :: :: :: :: ::: :: :: ::

InitFft

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

FreeFft

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

8 Windowing 34

GenBlackman

:::: :: ::: :: :: :: :: ::: :: :: ::

GenHamming

:::: :: ::: :: :: :: :: ::: :: :: ::

GenHanning

::: :: :: ::: :: :: :: :: ::: :: :: ::

GenTriangle

::: :: :: ::: :: :: :: :: ::: :: :: ::

Page4of87

9Filters 39

Fir

:::: :: :: :: :: ::: :: :: :: :: ::: :: :: ::

Fir1

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Iir

:::: :: :: :: :: ::: :: :: :: :: ::: :: :: ::

Iir1

:::: :: :: :: :: ::: :: :: :: :: ::: :: :: ::

DIir

:::: :: :: :: :: ::: :: :: :: :: ::: :: :: ::

DIir1

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Lms

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Lms1

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

InitFir

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

InitIir

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

InitDIir

:::: :: :: :: ::: :: :: :: :: ::: :: :: ::

InitLms

::: :: :: :: ::: :: :: :: :: ::: :: :: ::

10 Convolution and Correlation 64

ConvComplete

:::: :: ::: :: :: :: :: ::: :: :: ::

ConvCyclic

::: :: :: ::: :: :: :: :: ::: :: :: ::

ConvPartial

::: :: :: ::: :: :: :: :: ::: :: :: ::

Correlate

::: :: :: :: ::: :: :: :: :: ::: :: :: ::

CorrCyclic

:::: :: :: ::: :: :: :: :: ::: :: :: ::

11 Miscellaneous 74

GenGWnoise

:::: :: ::: :: :: :: :: ::: :: :: ::

MatrixMult

::: :: :: ::: :: :: :: :: ::: :: :: ::

VectorMult

::: :: :: ::: :: :: :: :: ::: :: :: ::

MsPower

::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Mean

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Variance

::: :: :: :: ::: :: :: :: :: ::: :: :: ::

MaxI

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

MinI

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

PeakI

::::: :: :: :: ::: :: :: :: :: ::: :: :: ::

Page5of87

A Contents of Distribution Disk 86

References 87

Page6of87

1Intro duction

This do cument describes

DSPLib-SH

, a library of signal pro cessing routines

developed for use with Hitachi's SH RISC pro cessors.

Twoversions of the library are provided. In the rst version all of the time-

critical routines are C-callable, but have been written in SH assembler to

minimise execution time whilst maintaining accuracy.

The second version of the library contains no assembly code. All routines are

written in portable ANSI-C and aim to give results that are bit identical to

the optimised assembly library. These libraries allow a development metho d-

ology beginning with simulation and program developmentonaworkstation

using the ANSI-C library. Development will then transfer to the target hard-

ware, using the optimised version of the library, where hardware dep endencies

and real-time issues can be resolved.

This methodology enables designers to gain the b enets of using ecient

hand-coded software for the time-critical sections of their programs, whilst

retaining the benets of developing software written in `C'.

Five groups of routines are included in

DSPLib-SH



Fast Fourier Transforms



Windowing Functions



Filters



Convolution and Correlation



Miscellaneous

The routines are all re-entrant. That is, they do not use any lo cal static

variables. This enables them to be used in a multi-tasking environment, or

to be called from interrupt routines without aecting their use in the main

program.

Timings are given for the most time-critical routines. The timings are for a

10 MIPS SH-1, with all internal program and data accesses taking a single

Page7of87

processor cycle over the 32 bit memory bus. Timings for other clo ck speeds

can be derived by scaling those given here.

The routines provided in this library have been optimised for the SH-1 pro-

cessor. SH-2 and SH-3 processors will also benet from

DSPLib-SH

, but

there will be some dierences in execution times compared to the SH-1.

Page8of87

2Installation

DSPLib-SH

is shipped on a single 3

" MS-DOS disk, the contents of which

are listed in Appendix A. It comprises a C header le,

ensigma.h

, the library

routines themselves and examples of the library's use. The optimised library

is in the

library

directory, in the pre-assembled ob ject library

ensigma.lib

The source for the optimised and ansi-C libraries are supplied in the

optsrc

and

ansisrc

Examples showing the use of each routine are provided in the

examples

directory on the disk.

The header le and libraries should b e installed in a directory visible to

the C compiler and linker resp ectively. It is recommended that the same

directories as the compiler support les is used. Assuming that this directory

shc

shc2.0

and the oppy disk driveis

, the installation pro cedure

on an MSDOS PC is as follows|

1. Make a backup of the distribution disk.

2. Copy

DSPLib-SH

onto the system disk. Put the distribution disk

into drive

and enter the following:

XCOPY A:\* C:

shc

shc2.0 /S

3. Installation is now complete.

Should it be necessary to rebuild the optimised library from source|

1. Change to the

optsrc

directory.

2. Edit

make.bat

and

lbr.sub

to check that the conguration matches

your system.

3. Type

make

ensigma.lib

has now been rebuilt.

For the C version of the library, the build pro cess is both op erating system

and compiler dependent. For a Unix operating system, the procedure is as

follows|

Page9of87

1. Copy

DSPLib-SH

onto the system disk.

2. Change to the

ansisrc

directory.

3. Edit

makefile

to correspond to your system.

4. Type

make

ensigmaC.a

has now been built.

Page 10 of 87

3 Data Formats

The ma jority of calculations performed in

DSPLib-SH

use xed point arith-

metic. Both integer and fractional number representations are used; these

formats are discussed below.

Integer arithmetic represents numbers with the decimal point xed at the

right of the LSB (least signicant bit). A signed 16 bit integer variable can

representvalues in the range [-2

{1]. When two signed 16 bit integers

are multiplied, the signed 31 bit pro duct is aligned at the LSB. If the result

is stored in a 16 bit integer numerical overowmay o ccur.

Fractional arithmetic is also useful, especially in DSP applications. Here

the decimal p oint is xed immediately to the right of the most signicant

bit (MSB) for signed numbers (or immediately to the left of the MSB for

unsigned numbers). Signed 16 bit fractions can representvalues in the range

[-1:1-2

]. When two signed 16 bit fractions are multiplied the signed 31

bit product is aligned at the MSB. If the result is stored as a 16 bit fraction

numerical underowmay o ccur.

DSPLib-SH

makes extensive use of b oth integer and fractional arithmetic.

To maximise the exibility of the library many of the routines allow the align-

ment of each output to b e specied. For example, calling

Fir

with

res shift

corresponds to LSB aligned (integer) lter coecients. When

res shift

=15

the processing corresp onds to MSB aligned (fractional) lter coecients.

For most possible values of

res shift

, there is a p ossibilityofnumerical over-

ow. Except where stated otherwise,

DSPLib-SH

detects this condition

and saturates the value to the appropriate full-scale value. This minimises

the error introduced, at the exp ense of a small increase in processing time

when overow occurs.

Implementation

The C

short

and

int

types provide xed point arithmetic. On the SH, signed

16 bit integers correspond to

short

and signed 32 bit integers correspond to

long

int

Page 11 of 87

4 ANSI C Library

An ANSI-C version of

DSPLib-SH

is provided in

dsplibsh

ansisrc

This version is supplied to ease application development, allowing the early

stages to be p erformed without the constraints imposed by particular hard-

ware.

The C library and optimised SH-1 library give bit identical results for most

valid inputs, for all routines except the FFT and double precision lter rou-

tines, where errors in the least signicant bits may o ccur.

The exception to this rule o ccurs when the 42 bit accumulator (MACH/MACL)

overows. In this case the assembler library on an SH-1, the assembler li-

brary on an SH-2 and the C library on a host can all give dierent results.

However, this can only happ en with input vectors containing more than 1023

samples, so it is extremely unlikely in any practical use of

DSPLib-SH

Compiler Restrictions

The C library must b e build with an ANSI C compiler, with the restriction

that variables of type

double

allow the exact representation of 42 bit integers,

and the p ointer representation must require four or fewer bytes.

In practise all Unix and MSDOS ANSI C compilers are suitable as they use

8bytes to hold double variables, typically allowing the exact representation

of integers of up to 53 bits.

Page 12 of 87

5 Eciency

The routines provided by

DSPLib-SH

have been hand optimised for maxi-

mum speed on SH-1 pro cessors. In many cases the library routines approach

the theoretical optimum p erformance of the architecture, with minimum in-

struction counts and minimal pip eline conicts.

As SH is a RISC architecture, it is relatively straightforward to maximise the

eciency of the library based on the following two features: The assembly

routines provided by

DSPLib-SH

have been written so that most program

fetches do not cause pro cessor stalls, as long as the program memory supp orts

32 bit reads; furthermore, most of the processing in

DSPLib-SH

operates

on 16 bit data.

Therefore, when dening the memory map of a target system, the following

two recommendations should be ob eyed whenever p ossible:



the program code segment should b e lo cated in memory that supports

single cycle 32 bit reads, and



the data segment should b e located in memory that supp orts single

cycle 16 (or 32) bit reads and writes.

If there is sucientinternal 32 bit memory, this would be a suitable lo cation

for the library code and data. If other memory must b e used the recom-

mendations ab ove should b e followed whenever p ossible | b ecause dierent

devices in the SH family contain dierenttyp es of memory and supp ort

dierent caching schemes, it is not p ossible to provide completely general

advice.

Page 13 of 87

6 Test and Example Programs

Test programs for all library routines are provided in the

examples

directory

on the distribution disk. These carefully test each library routine in turn.

Among other things they test for correct argumentchecking and perform

anumber of bit-exact tests comparing the results with precalculated data.

These tests also illustrate the use of the library routines.

The test functions are supplied in the subdirectory

test

.Test data is in the

testdat

subdirectory. The main test les are|

testfft.c

FFT test functions.

testwin.c

Window test functions.

testfilt.c

Filter test functions.

testconv.c

Convolution and correlation test functions.

testmisc.c

Miscellaneous test functions.

These are built using the make utility.Typing

make all

builds

testfft.abs

and

testfft.mot

etc. Alternatively individual test functions may be made

bytyping eg.

make testconv.abs

In addition to the individual test les and function library, the following les

are provided|

testhead.h

Prototypes for test functions.

link.cmd

Linker subcommand le based on the SH7030 memory map.

c0.src

C startup le.

make.bat

MSDOS build commands.

makefile

(Unix) makele.

A simple demonstration program that illustrates the use of common library

functions is provided in the directory

demo

. This illustrates the use of t,

window, lter and miscellaneous functions.

The demonstration program is built using the following les:

Page 14 of 87

demo.c

The demonstration program.

demoio.c

Data input/output functions. These functions simulate the

data interfaces that would be present in a complete system.

demoio.h

Function prototypes for

demoio.c

link.cmd

Linker subcommand le based on the SH7030 memory map.

c0.src

C startup le.

make.bat

MSDOS build commands.

makefile

(Unix) makele.

Page 15 of 87

7 Fast Fourier Transforms

Contents|

FftComplex

Compute not-in-place, complex FFT.

FftReal

Compute not-in-place, real FFT.

ItComplex

Compute not-in-place, complex inverse FFT.

ItReal

Compute not-in-place, real inverse FFT.

FftInComplex

Compute in-place, complex FFT.

FftInReal

Compute in-place, real FFT.

ItInComplex

Compute in-place, complex inverse FFT.

ItInReal

Compute in-place, real inverse FFT.

LogMagnitude

Convert complex FFT output to log magnitude format.

InitFft

Generate FFT lo okup tables.

FreeFft

Release FFT lookup table memory.

These functions calculate the classical forward and inverse Fourier trans-

forms, with a user dened scaling. The forward transform is dened by

jn=N

where

(shift) is the number of stages where an additional halving is p er-

formed (see the

Scaling

section b elow) and

is the number of complex

points.

The inverse transform is dened by

jn=N

Page 16 of 87

Routine Timings

The execution time in milliseconds of the FFT routines on a 10 MIPS SH-1

are|

Not in place

In place

FFT Inverse FFT FFT Inverse FFT

Size Cmplx Real Cmplx Real Cmplx Real Cmplx Real

128 1.94 1.06 2.19 1.20 1.94 1.06 2.20 1.19

256 4.43 2.37 4.94 2.66 4.46 2.37 4.97 2.64

512 9.98 5.29 11.00 5.88 10.03 5.32 11.06 5.86

1024 22.23 11.70 24.28 12.88 22.39 11.75 24.44 12.83

Complex ArrayFormat

Time and frequency series of complex samples are used as inputs and outputs

to the Fourier transform routines. The complex array format used in all these

routines is alternating real-imaginary i.e. a sequence of

complex numbers

is stored in an array

]=Re

(

)

; x

+1]=Im

(

)



n<N

There is one exception to this format, when the frequency representation of

a real signal of

samples is stored in a complex arrayof

coecients. In

this case element

[0] contains the real comp onent of the DC comp onentof

the signal and

[1] contains the real component of the

2 frequency (b oth

DC and

2 comp onents are real, their imaginary comp onents are zero).

Scaling

In an FFT the signal power doubles at each natural radix-2 stage; the p eak

signal amplitude can also double. This doubling can cause arithmetic over-

ow when transforming a high power signal, but can be prevented by a di-

vision bytwo at each radix-2 stage. However, excessive halving of the signal

will add unnecessary quantisation noise.

Page 17 of 87

The optimum balance b etween overow and quantisation noise is highly de-

pendent on the characteristics of the input signal. A high p ower pure tone,

for example, will require the maximum scaling to avoid overow, whereas an

impulsive signal will require very little.

The safest approach is to halve the signal amplitude at every radix-2 stage.

As long as each complex input samples is scaled to havepower less than 2

this approach guarantees that overow will not o ccur. At the other extreme,

a forward FFT of a low amplitude signal with a at sp ectrum could be

performed with no halving without overow occurring.

A middle path is to halve the input signal on alternate stages.

DSPLib-SH

allows this approach, and also allows ner control of scaling, with halving

selectable individually for each radix-2 stage. Careful selection of this scaling

allows the combined eects of saturation and quantisation to b e minimised.

It should be emphasised that the conservative approachof checking the input

sample p ower and halving at each stage should always be used unless you

are condent of the properties of the input signal.

To sp ecify the required approach each FFT function has a

scale

parameter.

scale

is interpreted least signicant bit rst, with one bit corresponding to

each radix-2 stage. A division bytwo is p erformed at all stages for which the

corresponding

scale

bit has b een set.

For example,



scale

0XFFFFFFFF

(or

size

-1) species halving on every stage, giving a

nal output with 1/

size

times the p ower of the input, with a guarantee

that no overow has o ccurred if the initial complex samples all have

power less than 2



scale

0X55555555

species halving on alternate stages, giving a nal

output with the same p ower as the input (if there are an even number

of stages).



scale

species no halving, giving a nal output with

scale

times

the power of the input,

EFFTALLSCALE

EFFTMIDSCALE

and

EFFTNOSCALE

, dened in

ensigma.h

, can

be used to provide these constants.

Page 18 of 87

Test Functions

The test le

testfft.c

is supplied for the routines in this section. The

functions

TestFft

and

TestIt

test the FFT and IFFT transforms resp ectively.

Page 19 of 87

FftComplex

Denition :

int

FftComplex

(

output, input, size, scale

)

Arguments :

short

output[ ]

complex output data.

short

input[ ]

complex input data.

long

size

size of FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates a complex Fast Fourier Transform. The calcu-

lation is not-in-place, so the input and output arrays must not overlap

(see

FftInComplex

for an in-place version of this routine).

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

output

should be lo cated in on-chip

memory.

Page 20 of 87

FftReal

Denition :

int

FftReal

(

output, input, size, scale

)

Arguments :

short

output[ ]

positive frequency complex output data.

short

input[ ]

real input data

long

size

size of FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates a real Fast Fourier Transform by performing a

complex FFT of half the size and then transforming the data.

On returning,

output

contains positive frequencies only. The negative

frequencies are simply the complex conjugate of the p ositive frequen-

cies. Since the frequency values at b oth 0 and

2 are real, the

output is placed in the second elementof

output

i.e. where the imagi-

nary component of zero frequency would have b een stored.

The calculation is not-in-place, so the input and output arrays must

not overlap (see

FftInReal

for an in-place version of this routine).

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft

Page 21 of 87



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

output

should be lo cated in on-chip

memory.

Page 22 of 87

ItComplex

Denition :

int

ItComplex

(

output, input, size, scale

)

Arguments :

short

output[ ]

complex output data.

short

input[ ]

complex input data.

long

size

size of inverse FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an inverse Fast Fourier Transform. The calcu-

lation is not-in-place, so the input and output arrays must not overlap

(see

ItInComplex

for an in-place version of this routine).

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

output

should be lo cated in on-chip

memory.

Page 23 of 87

ItReal

Denition :

int

ItReal

(

output, input, size, scale

)

Arguments :

short

output[ ]

real output data.

short

input[ ]

positive frequency complex input data.

long

size

size of inverse FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an inverse Fast Fourier Transform by transform-

ing the data and p erforming a complex FFT of half the size.

input

should contain the complex frequency data for the positive fre-

quencies only. The routine assumes that the negative frequencies are

simply the complex conjugate of the positive frequencies. Since the

frequency values at b oth 0 and

2 can only b e real, the

2value

should b e placed in the second elementof

input

where the imaginary

componentof 0 would have been stored. When

ItReal

returns

output

contains a series of real data samples.

The calculation is not-in-place, so the input and output arrays must

not overlap (see

ItInReal

for an in-place version of this routine).

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.

Page 24 of 87



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

input

and

output

should b oth be lo cated

in on-chip memory.

Page 25 of 87

FftInComplex

Denition :

int

FftInComplex

(

data, size, scale

)

Arguments :

short

data[ ]

complex data.

long

size

size of FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an in-place complex Fast Fourier Transform.

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

data

should be lo cated in on-chip mem-

ory.

Page 26 of 87

FftInReal

Denition :

int

FftInReal

(

data, size, scale

)

Arguments :

short

data[ ]

real data on input, complex p ositive

frequency data on output.

long

size

size of FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an in-place real Fast Fourier Transform by per-

forming a complex FFT of half the size and then transforming the

data.

On returning,

data

contains the complex transform data for the p osi-

tive frequencies only. The negative frequencies are simply the complex

conjugate of the p ositive frequencies. Since the frequency values at

both 0 and

2 are real, the

2 output is placed in the second el-

ementof

data

i.e. where the imaginary comp onentof0 would have

been stored. Although this slightly complicates the output format, it

enables the FFT to b e carried out in place.

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.

Page 27 of 87



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

data

should be lo cated in on-chip mem-

ory.

Page 28 of 87

ItInComplex

Denition :

int

ItInComplex

(

data, size, scale

)

Arguments :

short

data[ ]

complex data.

long

size

size of inverse FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an in-place complex inverse Fast Fourier Trans-

form.

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

data

should be lo cated in on-chip mem-

ory.

Page 29 of 87

ItInReal

Denition :

int

ItInReal

(

data, size, scale

)

Arguments :

short

data[ ]

complex positive frequency data on in-

put, real data on output.

long

size

size of inverse FFT.

long

scale

scaling sp ecication.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

not a p ower of 2.

size

32768.

Description :

This routine calculates an in-place real inverse Fast Fourier Transform

by transforming the data and p erforming a complex FFT of half the

size.

data

should contain the complex frequency data for the p ositive fre-

quencies only. The routine assumes that the negative frequencies are

simply the complex conjugate of the positive frequencies. Since the

frequency values at b oth 0 and

2 can only b e real, the

2value

should b e placed in the second elementof

data

where the imaginary

componentof 0 would have b een stored. When

ItInReal

returns

data

contains a series of real samples.

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.

Page 30 of 87



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



scale

should b e

0xffffffff

and the peak signal p ower should b e

unless the prop erties of the input signal are well understood.



Only the b ottom log

(

size

) bits of

scale

are used; the MSB's are

ignored.



To minimise bus conicts

data

should be lo cated in on-chip mem-

ory.

Page 31 of 87

LogMagnitude

Denition :

int

LogMagnitude

(

output, input, no elements, fscale

)

Arguments :

short

output[ ]

real output

short

input[ ]

complex input

long

no elements

number of log magnitude outputs

required.

float

fscale

output scaling factor

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

fscale

j

(10 log

Description :

This routine calculates the log magnitude of the complex input data in

decibels, and writes the scaled result into the output array|

(

)=10

fscale

log

(

)

+1)

) 0



n<N

Notes :



The ordering of real and imaginary components in a complex array

is described on page 16.



Before calling this routine the bit reversal and twiddle tables

should b e initialised by calling

InitFft



output

can be specied to b e the same as

input

. In this case, the

output overwrites the rst half of the input array.

Page 32 of 87

InitFft

Denition :

int

InitFft

(

max size

)

Arguments :

long

max size

Maximum size of FFT that will be

required.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

max size

notapower of 2.

max size

32768.

Description :

This routine generates the sine/cosine and bit reversal tables used by

the FFT functions. This function need only b e called once, before the

rst FFT is called. The lo okup tables are stored in memory allo cated

malloc

Notes :



The lookup tables are generated for the specied transform size.

Smaller transforms will be p erformed using the same lookup ta-

bles.



The addresses of the lookup tables are stored in internal variables;

they should not b e accessed by user co de.

Page 33 of 87

FreeFft

Denition :

void

FreeFft

(

void

)

Arguments :

none

Returns :

void

Description :

This routine returns the memory used for FFT lookup tables to the

heap via

free

. No further FFT functions can be called b efore a subse-

quent

InitFft

Page 34 of 87

8 Windowing

Contents|

GenBlackman

Generate a Blackman window.

GenHamming

Generate a Hamming window.

GenHanning

Generate a Hanning window.

GenTriangle

Generate a Triangle window.

Test Functions

The test le

testwin.c

is supplied for the routines in this section. The

function

Tstwin

compares the window functions against precalculated data to

ensure accuracy. Note that the windows produced on dierent architectures

may dier in the least signicant bit.

Page 35 of 87

GenBlackman

Denition :

int

GenBlackman

(

output, win size

)

Arguments :

short

output[ ]

array to contain window

(

long

win size

window size

required.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

win size



Description :

This routine generates a Blackman windowin

output

VectorMult

can

be used to apply the window to a data array.

The function used is|

(

)=(2



5 cos



n



08 cos



n





n<N

Page 36 of 87

GenHamming

Denition :

int

GenHamming

(

output, win size

)

Arguments :

short

output[ ]

array to contain window

(

long

win size

window size

required.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

win size



Description :

This routine generates a Hamming windowin

output

VectorMult

can

be used to apply the window to a data array.

The function used is|

(

)=(2



46 cos



n





n<N

Page 37 of 87

GenHanning

Denition :

int

GenHanning

(

output, win size

)

Arguments :

short

output[ ]

array to contain window

(

long

win size

window size

required.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

win size



Description :

This routine generates a Hanning window in array

output

. The routine

VectorMult

can be used to apply the window to a data array.

The function used is|

(



cos



n





n<N

Page 38 of 87

GenTriangle

Denition :

int

GenTriangle

(

output, win size

)

Arguments :

short

output[ ]

array to contain window

(

long

win size

window size

required.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

win size



Description :

This routine generates a Triangular windowin

output

VectorMult

can

be used to apply the window to a data array.

The function used is|

(

)=(2









n<N

Page 39 of 87

9 Filters

Contents|

Fir

Implement a nite impulse resp onse lter.

Fir1

Implement a nite impulse resp onse lter for a single input

sample.

Iir

Implement an innite impulse resp onse lter.

Iir1

Implement an innite impulse resp onse lter for a single

input sample.

DIir

Implement a double precision innite impulse response

lter.

DIir1

Implement a double precision innite impulse resp onse lter

for a single input sample.

Lms

Implement a real adaptive FIR lter.

Lms1

Implement a real adaptive FIR lter for a single input

sample.

InitFir

Initialise FIR lter.

InitDIir

Initialise double precision I IR lter.

InitIir

Initialise I IR lter.

InitLms

Initialise LMS lter.

Routine Timings

The execution time in microseconds of the lter routines on a 10 MIPS SH-1

are given b elow. The times for the single sample versions are p er routine

call. The times for the multiple sample versions are on a p er sample basis,

computed by dividing the time to process a 100 sample frame by 100. Note

that the times vary for dierent result shift values.

No. of Function

coes Fir Lms Fir1 Lms1

5 8.6 24.9 17.2 34.7

10 10.3 40.6 18.9 50.4

100 46.1 328.4 54.7 338.2

Page 40 of 87

No. of Function

biquads Iir DIir Iir1 DIir1

1 10.5 40.1 18.1 48.0

2 19.7 77.6 28.8 86.4

20 185.5 752.8 221.4 784.3

Co ecient Scaling

All digital signal processing is likely to introduce noise into the signal due

to saturation or quantisation. The scaling of coecients must b e chosen

carefully to balance the eects of saturation and quantisation.

If the co ecients are to o large saturation may occur; if they are to o low

excessive quantisation noise maybe intro duced.

For FIR (nite impulse response) lters on the SH-1 no saturation will occur

coes

and

res shift

=26

However, for many input signals this scaling is overly pessimistic; smaller

result shifts (or larger co ecientvalues) could be used with a low likelihoo d

of saturation, but with signicantly reduced quantisation noise.

IIR (innite impulse response) lters have a recursive structure, which means

that the scaling approach describ ed above is inappropriate.

LMS (least mean squares) adaptive lters obey the same conventions as FIR

lters. However, the coecients may b e pushed into saturation as the co ef-

cients are adapted.

Workspace

Digital lters have state that must b e preserved from the pro cessing of one

sample to the next. This lter state, or workspace, must b e stored in memory

that can b e accessed with minimum overhead. Moreover, the lter state must

be initialised before the rst sample can be pro cessed.

The structure of the workspace memory is liable to change in the future,

so user programs should not attempt to read or modify this memory | it

should only b e accessed by the library functions.

Page 41 of 87

Test Functions

The lter functions are tested for bit exactness by the le

testfilt.c

. The

FIR, IIR and DI IR lters are tested by ltering a number of sinusoids through

alow-pass lter with a cut-o at exactly one-quarter the sample rate. The

LMS lter is tested by passing coloured (FIR ltered) noise through it and

requiring it to adapt to a set of precalculated lter coecients.

The test functions are

Tstr

Tstr1

Tstiir

Tstiir1

Tstdiir

Tstdiir1

Tstlms

and

Tstlms1

Page 42 of 87

Fir

Denition :

int

Fir

(

output, input, no samples, coe, no co es, res shift,

workspace

)

Arguments :

short

output[ ]

output samples

short

input[ ]

input samples

long

no samples

number of samples

to b e ltered.

short

coe[ ]

array of lter coecients

long

no co es

number of coecients

(length of

lter).

int

res shift

right shift applied to each output.

short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no samples

no co es



res shift

27.

Description :

This routine implements a nite impulse resp onse (FIR) lter. It uses

workspace

to record the most recent input samples. The result of

ltering the data in

input

is written to

output

(

)

(

)

res shift

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Page 43 of 87

Notes :



Filtering is likely to introduce noise due to saturation or quantisa-

tion. If the co ecients are to o large saturation may occur; if they

are too low excessive quantisation noise maybe introduced. The

scaling of co ecients must be carefully considered to balance the

eects of quantisation and saturation.

One scheme that avoids saturation is to scale the co ecients so

that

abs(

coes

)

and

res shift

= 26. For many input

signals smaller result shifts could be used with a low likelihoo d of

saturation, but with signicantly reduced quantisation noise.



Before calling this routine for a new lter, initialise the lter

workspace by calling

InitFir



Execution is fastest when

res shift

is set to 17 or 18.



output

may b e the same as

input

in which case

input

is overwrit-

ten.

Page 44 of 87

Fir1

Denition :

int

Fir1

(

output, input, co e, no co es, res shift, workspace

)

Arguments :

short

*output

output sample

(

short

input

input sample

(

short

coe[ ]

array of lter coecients

long

no co es

number of coecients

(length of

lter).

int

res shift

right shift applied to each output.

short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no co e



res shift

27.

Description :

This routine implements a nite impulse response (FIR) lter for one

sample only. It uses

workspace

to record the most recent input samples.

The result of ltering the data in

input

is written to

output

(

)

(

)

res shift

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Notes :

Page 45 of 87



Filtering is likely to introduce noise due to saturation or quanti-

sation. If the co ecients are to o large saturation may o ccur; if

they are too low excessive quantisation noise maybeintro duced.

the scaling of co ecients must be carefully considered to balance

the eects of quantisation and saturation.

One scheme that avoids saturation is to scale the co ecients so

that

abs(

coes

)

and

res shift

= 26. For many input

signals smaller result shifts could be used with a low likelihoo d of

saturation, but with signicantly reduced quantisation noise.



Before calling this routine for a new lter, initialise the lter

workspace by calling

InitFir



Execution is fastest when

res shift

is set to 17 or 18.

Page 46 of 87

Iir

Denition :

int

Iir

(

output, input, no samples, co e, no sections, workspace

)

Arguments :

short

output[ ]

output samples

short

input[ ]

input samples

long

no samples

number of samples

to b e ltered.

short

coe[ ]

lter co ecients.

long

no sections

number of second order lter sections

short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no samples

no sections

16.

Description :

This routine implements an innite impulse resp onse (I IR) lter.

The lter is implemented as a cascade of

second order lters called

biquads, with an additional scaling performed on the biquad output.

This implementation uses the Direct Form I I representation, where the

coecients are specied in fractional (Q15) format and the output of

each biquad is given by

(

1) +

(

2)+2

(

)

(

)=[

(

1) +

(

2)]

Page 47 of 87

The input

(

)tothe

section is the output

(

) of the previous

section. The input to the rst (

= 0) section is taken from

input

. The

output from the last (

1) section is written to

output

Each biquad is calculated in 32 bits using saturating arithmetic. Fol-

lowing the multiply by2

or 2

, 16 LSB's are extracted from the

accumulator. This extraction rounds the surplus LSB's and checks the

surplus MSB's for overow, saturating if necessary.

The lter co ecients should b e sp ecied in

coe

in the order|

:::b

Notes :



Before calling this routine for a new lter, initialise the lter by

calling

InitIir



describes a shift rather than a multiply. The equivalentmul-

tiply would be by2

.If

0or

16 the output of the

biquad is undened.



output

may b e the same as

input

in which case

input

is overwrit-

ten.



To minimise bus conicts

workspace

should b e lo cated in on-chip

memory.

Page 48 of 87

Iir1

Denition :

int

Iir1

(

output, input, coe, no sections, workspace

)

Arguments :

short

*output

output sample

(

short

input

input sample

(

short

coe[ ]

lter co ecients.

long

no sections

number of second order lter sections

short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no sections

16.

Description :

This routine implements an innite impulse resp onse (I IR) lter for

one sample only.

The lter is implemented as a cascade of

second order lters called

biquads, with an additional scaling performed on the biquad output.

This implementation uses the Direct Form I I representation, where the

coecients are specied in fractional (Q15) format and the output of

each biquad is given by

(

1) +

(

2)+2

(

)

(

)=[

(

1) +

(

2)]

The input

(

)tothe

th section is the output

(

) of the previous

section. The input to the rst (

= 0) section is taken from

input

. The

output from the last (

1) section is written to

output

coe

should contain co ecients in the order|

Page 49 of 87

:::b

Each biquad is calculated in 32 bits using saturating arithmetic. Fol-

lowing the multiply by2

or 2

, 16 LSB's are extracted from the

accumulator. This extraction rounds the surplus LSB's and checks the

surplus MSB's for overow, saturating if necessary.

Notes :



Before calling this routine for a new lter, initialise the lter by

calling

InitIir



describes a shift rather than a multiply. The equivalentmul-

tiply would be by2

.If

0or

16 the output of the

biquad is undened.



To minimise bus conicts

workspace

should b e lo cated in on-chip

memory.

Page 50 of 87

DIir

Denition :

int

DIir

(

output, input, no samples, co e, no sections, workspace

)

Arguments :

short

output[ ]

output samples

short

input[ ]

input samples

long

no samples

number of samples

to b e ltered.

long

coe[ ]

lter co ecients.

long

no sections

number of second order lter sections

long

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no samples

no sections

32 for

k<K

48 for

Description :

This routine implements an innite impulse resp onse (I IR) lter with

double precision coecients and delay node storage.

The lter is implemented as a cascade of

second order lters called

biquads, with an additional scaling performed on the biquad output.

This implementation uses the Direct Form I I representation, where the

coecients are specied in fractional (Q31) format and the output of

each biquad is given by

(

1) +

(

2)+2

(

)

(

)=[

(

1) +

(

2)]

Page 51 of 87

The input

(

)tothe

th section is the output

(

) of the previous

section. The input to the rst (

= 0) section is taken from

input

after

amultiply by2

. The output from the last (

1) section is

written to

output

coe

should contain coecients in the order|

:::b

DIir

diers from

Iir

in that the lter co ecients are specied as 32

rather than 16 bit values. Consequently 64 bit accumulators is used,

with intermediate biquad outputs stored as 32 bit quantities following

the shifts by31 or

. This extraction rounds the surplus LSB's and

checks the surplus MSB's for overow, saturating if necessary.In the

nal stage only 16 bits are retained following the shift by

; again

surplus LSB's are rounded and surplus MSB's are saturated.

Notes :



Before calling this routine for a new lter, initialise the lter by

calling

InitDIir



describes a shift rather than a multiply. The equivalentmul-

tiply would b e by2

.If

0or

32 (or

48)

the output of the biquad is undened.



The most common use of

DIir

species the co ecients in Q31

format. In this case

should be set to 31 for

k<K

1 and to

47 for



The least signicant bit of each 32 bit co ecient and delaynode

value is ignored.



When p ossible

Iir

should b e used in preference to

DIir

as it runs

an order of magnitude faster on the SH-1.



output

may b e the same as

input

in which case

input

is overwrit-

ten.



To minimise bus conicts

workspace

should b e lo cated in on-chip

memory.

Page 52 of 87

DIir1

Denition :

int

DIir1

(

output, input, coe, no sections, workspace

)

Arguments :

short

*output

output sample

(

short

input

input sample

(

long

coe[ ]

lter co ecients.

long

no sections

number of second order lter sections

long

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no sections

32 for

k<K

48 for

Description :

This routine implements a double precision innite impulse resp onse

(IIR) lter for one sample only.

The lter is implemented as a cascade of

second order lters called

biquads, with an additional scaling performed on the biquad output.

This implementation uses the Direct Form I I representation, where the

coecients are specied in fractional (Q31) format and the output of

each biquad is given by

(

1) +

(

2)+2

(

)

(

)=[

(

1) +

(

2)]

The input

(

)tothe

th section is the output

(

) of the previous

section. The input to the rst (

= 0) section is taken from

input

after

Page 53 of 87

amultiply by2

. The output from the last (

1) section is

written to

output

coe

should contain coecients in the order|

:::b

As with

DIir

the lter coecients and delaynodevalues are stored as 32

rather than 16 bit values. Consequently 64 bit accumulators are used,

with intermediate biquad outputs stored as 32 bit quantities following

the shift by31 or

. This extraction rounds the surplus LSB's and

checks the surplus MSB's for overow, saturating if necessary.In the

nal stage only 16 bits are retained following the shift by

; again

surplus LSB's are rounded and surplus MSB's are discarded.

Notes :



Before calling this routine for a new lter, initialise the lter by

calling

InitDIir



describes a shift rather than a multiply. The equivalentmul-

tiply would b e by2

.If

0or

32 (or

48)

the output of the biquad is undened.



The most common use of

DIir

species the co ecients in Q31

format. In this case

should be set to 31 for

k<K

1 and to

47 for



The least signicant bit of each 32 bit co ecient and delaynode

value is ignored.



When p ossible

Iir

should b e used in preference to

DIir

as it runs

an order of magnitude faster on the SH-1.



To minimise bus conicts

workspace

should b e lo cated in on-chip

memory.

Page 54 of 87

Lms

Denition :

int

Lms

(

output, input, ref, no samples, coe, no coes, res shift,

conv fact, workspace

)

Arguments :

short

output[ ]

output samples

short

input[ ]

input samples

short

ref output[ ]

desired output value

long

no samples

number of samples

to b e ltered.

short

coe[ ]

adaptive lter coecients

long

no co es

number of coecients

int

res shift

right shift applied to each output.

short

conv fact

convergence factor 2



short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no samples

no co es



res shift

27.

Description :

This routine implements a real adaptive FIR lter using the least mean

square algorithm.

The FIR lter is dened as|

(

)

(

)

res shift

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

Page 55 of 87

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

The Widrow-Ho algorithm is used to update the lter co ecients|

(

)+2

e

(

)

(

)

where the addition is saturating and

(

) is the (saturated) error be-

tween the desired signal and the actual lter output|

(

)

(

)

The calculation of 2

e

(

)

(

) requires two16



16 multiplies. In

both multiplies bits [31:16] of the pro duct are retained, rounded and

saturated if necessary.

Notes :



Any digital signal processing can introduce noise due to saturation

or quantisation eects. If the co ecients are to o large saturation

may o ccur; if they are too low excessive quantisation noise may

be intro duced. The scaling of coecients must b e carefully con-

sidered to balance the eects of quantisation and saturation.

One scheme that avoids saturation is to scale the co ecients so

that

abs(

coes

)

and

res shift

= 26. For many input

signals smaller result shifts could be used with a low likelihoo d of

saturation, but with signicantly reduced quantisation noise.

However, it is not p ossible to guarantee that the co ecients gen-

erated by an LMS lter will conform to the scaling convention

described ab ove.



conv fact

should normally be positive; it should never b e 0x8000.



Before calling this routine for a new lter, initialise the lter

workspace by calling

InitLms



Execution is fastest when

res shift

is set to 17 or 18.



output

may be the same as

input

ref output

in which case

input

ref output

is overwritten.

Page 56 of 87



If no adaption is required 2



may b e set to zero. Alternatively

Fir

Fir1

may b e used with the same lter coecients and workspace.

Page 57 of 87

Lms1

Denition :

int

Lms1

(

output, input, ref output, co e, no co es, res shift,

conv fact,

workspace

)

Arguments :

short

*output

output sample

(

short

input

input sample

(

short

ref

desired output value

(

short

coe[ ]

adaptive lter coecients

long

no co es

number of coecients

int

res shift

right shift applied to each output.

short

conv fact

convergence factor 2



short

*workspace

variable containing p ointer to lter

memory.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no co es



res shift

27.

Description :

This routine implements a real adaptive FIR lter using the least mean

square algorithm, for a single input sample.

The FIR lter is dened as|

(

)

(

)

res shift

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

Page 58 of 87

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

The Widrow-Ho algorithm is used to update the lter co ecients|

(

)+2

e

(

)

(

)

where the addition is saturating and

(

) is the (saturated) error be-

tween the desired signal and the actual lter output|

(

)

(

)

The calculation of 2

e

(

)

(

) requires two16



16 multiplies. In

both multiplies bits [31:16] of the pro duct are retained, rounded and

saturated if necessary.

Notes :



Any digital signal processing can introduce noise due to saturation

or quantisation eects. If the co ecients are to o large saturation

may o ccur; if they are too low excessive quantisation noise may

be intro duced. The scaling of coecients must b e carefully con-

sidered to balance the eects of quantisation and saturation.

One scheme that avoids saturation is to scale the co ecients so

that

abs(

coes

)

and

res shift

= 26. For many input

signals smaller result shifts could be used with a low likelihoo d of

saturation, but with signicantly reduced quantisation noise.

However, it is not p ossible to guarantee that the co ecients gen-

erated by an LMS lter will conform to the scaling convention

described ab ove.



conv fact

should normally be positive; it should never b e 0x8000.



Before calling this routine for a new lter, initialise the lter

workspace by calling

InitLms



Execution is fastest when

res shift

is set to 17 or 18.



output

may be the same as

input

ref output

in which case the

input

or ref output is overwritten.

Page 59 of 87



If no adaption is required 2



may b e set to zero. Alternatively

Fir

Fir1

may b e used with the same lter coecients and workspace.

Page 60 of 87

InitFir

Denition :

int

InitFir

(

workspace, no co es

)

Arguments :

short

**workspace

address of variable containing p ointer

to buer reserved for lter memory.

long

no co es

number of coecients

Returns :

int

EDSP OK

success.

EDSP NO HEAP

insucient space available from

malloc

EDSP BAD ARG

no co es



Description :

This routine allocates, via

malloc

, the memory required for subsequent

calls to

Fir

and

Fir1

. The history of previous input samples is initialised

to zero.

Notes :



The

workspace

buer allocated by

InitFir

should only be manipu-

lated by

Fir

Fir1

Lms

and

Lms1

Page 61 of 87

InitIir

Denition :

int

InitIir

(

workspace, no sections

)

Arguments :

short

**workspace

address of variable containing p ointer

to buer reserved for lter memory.

long

no sections

number of second order lter sections

Returns :

int

EDSP OK

success.

EDSP NO HEAP

insucient space available from

malloc

EDSP BAD ARG

no sections

Description :

This routine allocates, via

malloc

, the memory required for subsequent

calls to

Iir

and

Iir1

. All delaynodevalues are initialised to zero.

Notes :



The

workspace

buer allo cated by

InitIir

should only be manipu-

lated by

Iir

and

Iir1

Page 62 of 87

InitDIir

Denition :

int

InitDIir

(

workspace, no sections

)

Arguments :

long

**workspace

address of variable containing pointer

to buer reserved for lter memory.

long

no sections

number of second order lter sections

Returns :

int

EDSP OK

success.

EDSP NO HEAP

insucient space available from

malloc

EDSP BAD ARG

no sections

Description :

This routine allocates, via

malloc

, the memory required for subsequent

calls to

DIir

and

DIir1

. All delaynode values are initialised to zero.

Notes :



The

workspace

buer allo cated by

InitDIir

should only be manip-

ulated by

DIir

and

DIir1

Page 63 of 87

InitLms

Denition :

int

InitLms

(

workspace, no co es

)

Arguments :

short

**workspace

address of variable containing p ointer

to buer reserved for lter memory.

long

no co es

number of coecients

Returns :

int

EDSP OK

success.

EDSP NO HEAP

insucient space available from

malloc

EDSP BAD ARG

no co es



Description :

This routine allocates, via

malloc

, the memory required for subsequent

calls to

Lms

and

Lms1

.The history of previous input samples is ini-

tialised to zero.

Notes :



The

workspace

buer allocated by

InitLms

should only be manip-

ulated by

Fir

Fir1

Lms

and

Lms1

Page 64 of 87

10 Convolution and Correlation

Contents|

ConvComplete

Calculate the complete convolution of two arrays.

ConvCyclic

Calculate the cyclic convolution of two arrays.

ConvPartial

Calculate the partial convolution of two arrays.

Correlate

Calculate the correlation between two arrays.

CorrCyclic

Calculate the cyclic correlation b etween two arrays.

Routine Timings

The execution time in microseconds of the lter routines on a 10 MIPS SH-1

are given below for

ConvPartial

ConvCyclic

CorrCyclic

and

Correlate

(with

no correlations past the end of the longer input). The times are given per

output sample, computed by dividing the time to produce

output samples

. Note that

is always the smaller input array.

Size of Routine

ConvPartial ConvCyclic CorrCyclic Correlate

5 5 9.7 9.7 9.0 8.1

10 10 10.6 11.2 10.2 9.0

100 100 54.6 55.4 47.7 46.4

Where correlations or convolutions past the ends of arrays take place, as

happens in

ConvComplete

and sometimes in

Correlate

, the output samples

are produced with less computation. The most extreme case of this is when

the two input arrays are the same size

, and the computations continue until

the arrays overlap by only one sample. In this case

ConvComplete

produces

1 samples, and

Correlate

produces

samples, with all but one of those

samples formed from computation past array ends.

Page 65 of 87

For this extreme case, the timings in microseconds per output sample, again

computed by dividing the time to pro duce

output samples by

, are|

Size of Routine

ConvComplete Correlate

5 9/5 6.5 8.4

10 19/10 7.1 8.2

100 199/100 29.0 26.1

Test Functions

The Convolution/Correlation functions are tested for bit exactness by the le

testconv.c

which calls the functions

Tconvcmp

Tconvcyc

Tconvpar

Tcorr

and

Tcorrcyc

Page 66 of 87

ConvComplete

Denition :

int

ConvComplete

(

output, iw, ix, iw size, ix size, res shift

)

Arguments :

short

output[ ]

output

short

iw[ ]

input

short

ix[ ]

input

long

iw size

size of

long

ix size

size of

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

iw size

ix size

res shift

27.

Description :

This routine completely convolves the two input arrays and puts the

result in the output array|

(

)

(

)

res shift



m<W

The routine uses zeros in place of elements b efore the start and after

the end of the two arrays.

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Page 67 of 87

Notes :



The output array size must b e at least



To minimise bus conicts

should b e lo cated in on-chip memory.

Page 68 of 87

ConvCyclic

Denition :

int

ConvCyclic

(

output, iw, ix, size, res shift

)

Arguments :

short

output[ ]

output

short

iw[ ]

input

short

ix[ ]

input

long

size

size of arrays

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

res shift

27.

Description :

This routine cyclically convolves the two input arrays and puts the

result in the output array|

(

)

(

)

res shift



m<N

where

= residue of

modulo

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Notes :



To minimise bus conicts

should b e lo cated in on-chip memory.

Page 69 of 87

ConvPartial

Denition :

int

ConvPartial

(

output, iw, ix, iw size, ix size, res shift

)

Arguments :

short

output[ ]

output

short

iw[ ]

smaller input

short

ix[ ]

larger input

long

iw size

size of

long

ix size

size of

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

iw size

ix size

iw size

res shift

27.

Description :

This routine convolves the two input arrays but does not include out-

puts derived from elements outside the array boundaries|

(

)

(

)

res shift



The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Notes :

Page 70 of 87



The output array size must b e at least

+1.



must b e the larger array, i.e.





To minimise bus conicts

should b e lo cated in on-chip memory.

Page 71 of 87

Correlate

Denition :

int

Correlate

(

output, iw, ix, iw size, ix size, no corr, res shift

)

Arguments :

short

output[ ]

output

short

iw[ ]

smaller input

short

ix[ ]

larger input

long

iw size

size of

long

ix size

size of

long

no corr

number of correlations

to compute.

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

iw size

ix size

no corr

ix size

iw size

res shift

27.

Description :

This routine correlates the two input arrays and puts the result in the

output array|

(

)

(

)

res shift



m<M

Correlation past the end of the

array is p ermissible, i.e.

X<W

In this case the routine uses zeros in place of the extra array elements.

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

Page 72 of 87

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Notes :



res shift

= 0 corresponds to a normal integer calculation.

res shift

= 15 corresp onds to a fractional calculation.



must b e the larger array, i.e.



, if the arrays are not the

same size.



To minimise bus conicts

should b e lo cated in on-chip memory.

Page 73 of 87

CorrCyclic

Denition :

int

CorrCyclic

(

output, iw, ix, size, res shift

)

Arguments :

short

output[ ]

output

short

iw[ ]

input

short

ix[ ]

input

long

size

size of arrays

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

size

res shift

27.

Description :

This routine cyclically correlates the two input arrays and puts the

result in the output array|

(

)

(

)

res shift



m<N

where

= residue of

modulo

The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Notes :



To minimise bus conicts

should b e lo cated in on-chip memory.

Page 74 of 87

11 Miscellaneous

Contents|

GenGWnoise

Generate Gaussian white noise.

MatrixMult

Multiply two matrices.

VectorMult

Multiply twovectors.

MsPower

Calculate mean square power.

Mean

Calculate mean.

Variance

Calculate mean and variance.

MaxI

Find maximum of integer array.

MinI

Find minimum of integer array.

PeakI

Find maximum absolute value of integer array.

Test Functions

The Miscellaneous functions are tested for bit exactness by the le

testmisc.c

The Gaussian white noise generator is tested by measuring the distribution

and spectrum of the samples. The `whiteness' of the spectrum and a Chi-

squared test are checked against performance targets. The remaining func-

tions are compared against precalculated data.

The functions

Tgwn

Tmatrixm

Tvectm

Tmean

Tvar

Tmspower

and

Textr

perform the testing.

Page 75 of 87

GenGWnoise

Denition :

int

GenGWnoise

(

output, no samples, variance

)

Arguments :

short

output[ ]

output white noise samples.

long

no samples

number of samples required.

float

variance

variance of noise distribution



Returns :

int

EDSP OK

success.

EDSP BAD ARG

no samples

variance



0.0.

Description :

This routine generates Gaussian white noise with zero mean and user-

specied variance. Samples are produced in pairs using the mo died

Box-Muller metho d described in [1]. To produce a pair of output sam-

ples, the standard random number generator provided by

rand

is used

to generate pairs of random numbers

between -1 and 1, until a

pair is found whose sum of squares

is less than 1. The pair of output

samples

are then calculated

r

(

)

r

(

)

Notes :



If an o dd number of samples is requested, the second sample of

the last pair is discarded.



The random number generator seed can be initialised to anyinte-

ger between 1 and 2

1 using

srand

Page 76 of 87



This routine is not strictly re-entrant since any calls to

rand

in an

interrupt routine or between calls to this routine will aect the

sequence of random numbers used. However, such calls will

not

aect the random properties of the white noise generated.



Floating point arithmetic is used in this function, so its use should

be restricted to test programs rather than application programs

whenever p ossible.

Page 77 of 87

MatrixMult

Denition :

int

MatrixMult

(

op matrix, ip matrix1, ip matrix2,

no rows1, no cols1, no cols2, res shift

)

Arguments :

void

*op matrix

pointer to rst element of output.

void

*ip matrix1

pointer to rst element of input 1.

void

*ip matrix2

pointer to rst element of input 2.

long

row dimension of matrix1.

long

column dimension of matrix1, row di-

mension of matrix2.

long

column dimension of matrix2.

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

m, n or p

res shift

27.

Description :

This routine multiplies the two matrices

ip matrix1

and

ip matrix2

and

stores the result in

op matrix

ip matrix1



ip matrix2



and

op matrix



The sum is accumulated in 42 bits. Each 16 bit output is extracted from

bit positions [res shift+15:res shift] of the accumulator. This extraction

rounds the surplus LSB's and checks the surplus MSB's for overow,

saturating if necessary.

Each matrix is stored in the normal `C' manner (row ma jor order)|

Page 78 of 87

Notes :



To minimise bus conicts

ip matrix1

should be located in on-chip

memory.



The function prototype sp ecies the array parameters as

void *

to allow arbitrary array sizes to b e sp ecied. These parameters

should p ointto

short

data.

Page 79 of 87

VectorMult

Denition :

int

VectorMult

(

output, ip1, ip2, no elements, res shift

)

Arguments :

short

output[ ]

output.

short

ip1[ ]

input.

short

ip2[ ]

input.

long

no elements

number of elements to evaluate.

int

res shift

right shift applied to each output.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

res shift

16.

Description :

This routine multiplies pairs of elements from

ip1

and

ip2

and stores

the results in

output

Notes :



Each 16 bit output is extracted from bit p ositions [res shift+15:res shift]

of the product. This extraction rounds the surplus LSB's and

checks the surplus MSB's for overow, saturating if necessary.



This routine p erforms elementwise multiplications. To calculate a

dot product use

MatrixMult

with

set to 1.

Page 80 of 87

MsPower

Denition :

int

MsPower

(

output, input, no elements

)

Arguments :

long

*output

result.

short

input[ ]

input.

long

no elements

number of elements

to evaluate.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine calculates the Mean Square Power of the input data|

Mean Square Power =

(

)

Notes :



The division result is rounded to the nearest integral value.



The sum is accumulated in 64 bits. If

no elements

is more than

-1 overowmay o ccur.

Page 81 of 87

Mean

Denition :

int

Mean

(

mean, input, no elements

)

Arguments :

short

*mean

mean of data in

input

short

input[ ]

input

long

no elements

number of elements

to pro cess.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine calculates the mean of

input

(

)

Notes :



The division result is rounded to the nearest integral value.



The sum is accumulated in 32 bits. If

no elements

is more than

-1 overowmay o ccur.

Page 82 of 87

Variance

Denition :

int

Variance

(

variance, mean, input, no elements

)

Arguments :

long

*variance

The variance of input



short

*mean

mean of data

short

input[ ]

input

long

no elements

number of elements

to pro cess.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine rst calculates the mean of

input

(

)

It then computes the variance|



(

)

Notes :



The division results are rounded to the nearest integral values.



is accumulated in 32 bits and is not checked for overow. If

no elements

is more than 2

-1 overowmay o ccur.





is accumulated in 64 bits and is not checked for overow.

Page 83 of 87

MaxI

Denition :

int

MaxI

(

max ptr, input, no elements

)

Arguments :

short

**max ptr

address of pointer to the maximum

element.

short

input[ ]

input.

long

no elements

number of elements to pro cess.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine searches

input

to nd the element with the maximum

value, and returns its address in

max ptr

Notes :



If several elements have the same maximum value the one nearest

the start of

input

is returned

Page 84 of 87

MinI

Denition :

int

MinI

(

min ptr, input, no elements

)

Arguments :

short

**min ptr

address of pointer to the minimum

element.

short

input[ ]

input.

long

no elements

number of elements to pro cess.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine searches

input

to nd the element with the minimum

value, and returns its address in

min ptr

Notes :



If several elements have the same minimum value the one nearest

the start of

input

is returned

Page 85 of 87

PeakI

Denition :

int

PeakI

(

peak ptr, input, no elements

)

Arguments :

short

**peak ptr

address of p ointer to the peak element.

short

input[ ]

input.

long

no elements

number of elements to pro cess.

Returns :

int

EDSP OK

success.

EDSP BAD ARG

no elements

Description :

This routine searches

input

to nd the element with the maximum

absolute value, and returns its address in

peak ptr

Notes :



If several elements have the same p eak value the one nearest the

start of

input

is returned

Page 86 of 87

A Contents of Distribution Disk

The distribution disk has six main directories,

include

lib

ansisrc

optsrc

demo

and

test

with the contents listed b elow.

The

lib

directory contains the le|

ensigma.lib

The

include

directory contains the le|

ensigdsp.h

The

optsrc

directory contains the les|

black.c fftinc.asm iir1.asm logmag.c peak.asm

convcomp.asm fftinr.asm initdiir.c make.bat shift.asm

convcycl.asm fftr.asm initfft.c makefile triangle.c

convpart.asm fir.asm initfir.c matrix.asm variance.asm

corrcycl.asm fir1.asm initiir.c max.asm vectmult.asm

correlat.asm gengwn.c initlms.c mean.asm

diir.asm hamming.c lbr.sub min.asm

diir1.asm hanning.c lms.asm mspower.asm

fftc.asm iir.asm lms1.asm mult.asm

The

ansisrc

directory contains the les|

black.c diir1.c fir1.c ifftreal.c initlms.c mean.c

convcomp.c fftcom.c gengwn.c iir.c lms.c min.c

convcycl.c fftcore.c hamming.c iir1.c lms1.c mspower.c

convpart.c fftincom.c hanning.c initdiir.c logmag.c peak.c

corrcycl.c fftireal.c ifftcom.c initfft.c makefile triangle.c

correlat.c fftreal.c iffticom.c initfir.c matrix.c variance.c

diir.c fir.c ifftirel.c initiir.c max.c vectmult.c

The

demo

directory contains the les|

c0.src demo.c demoio.c demoio.h link.cmd make.bat makefile

The

test

directory contains the les|

c0.src tconvpar.c testfilt.c tgwn.c tstdiir1.c tstlms1.c

link.cmd tcorr.c testhead.h tifft.c tstfir.c tstwin.c

make.bat tcorrcyc.c testmisc.c tmatrixm.c tstfir1.c tvar.c

makefile testconv.c testwin.c tmean.c tstiir.c tvectm.c

tconvcmp.c testdat textr.c tmspower.c tstiir1.c

tconvcyc.c testfft.c tfft.c tstdiir.c tstlms.c

It also contains the

testdat

directory, which contains the les|

filtdat.txt tconvpar.txt tmatrixm.txt tstfir.txt tvar.txt

miscdat.txt tcorr.txt tmean.txt tstiir.txt tvectm.txt

tconvcmp.txt tcorrcyc.txt tmspower.txt tstlms.txt

tconvcyc.txt textr.txt tstdiir.txt tstwin.txt

Page 87 of 87

References

[1] W. H. Press, B. P. Flannery,S.A.Teukolsky and W. T. Vetterling.

Numerical Recip es in C.

Cambridge University Press 1988.