DSPLib For The Hitachi SH1 7000 Series Microcontroller Application Note 19-031/1.0 June 1996 Hitachi Micro Systems Europe Ltd 1996 When using this document, keep the following in mind, 1, This document may, wholly or partially, be subject to change without notice. 2, All rights are reserved: No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without Hitachi's permission. 3, Hitachi will not be held responsible for any damage to the user that may result from accidents or any other reasons during the operation of the user's unit according to this document. 4, Circuitry and other examples described herein are meant only to indicate the characteristics and performance of Hitachi's semiconductor products. Hitachi assumes no responsibility for any intellectual property claims or other problems that may result from applications based on the examples therein. 5, No license is granted by implication or otherwise under any patents or other rights of any third party or Hitachi, Ltd. 6, MEDICAL APPLICATIONS: Hitachi's products are not authorised for use in MEDICAL APPLICATIONS without the written consent of the appropriate officer of Hitachi's sales company. Such use includes, but is not limited to, use in life support systems. Buyers of Hitachi's products are requested to notify the relevant sales office when planning to use the products in MEDICAL APPLICATIONS. Copyright Hitachi, Ltd.,1996. All rights reserved DSPLib For SH Application Note(19-031/1.0) Preface Preface This manual is intended as a user guide for the Digital Signal Processing library DSPLib. DSPLib has been developed for use with the Hitachi `Super H' range of RISC microcontrollers, and in particular the SH-1 device. The organisation of this manual is as follows : Overview: Gives an overview of the library and its performance. Function Descriptions: Describes the use of each function, including background theory and software examples where necessary. DSPLib Implementation: Takes the reader through a typical DSPLib implementation. Includes notes on hardware interfacing to the SH-1 as well as examples of typical software routines. Related Manuals The reader should refer to the following Hitachi manuals when using DSPLib: DSPLib For SH User Guide SH7032/SH7034 Hardware Manual SH7000 Series Programming Manual SH Series C Compiler SH Series Cross Assembler H Series Linkage Editor For software development tools, contact your Hitachi sales office. Hitachi Micro Systems Europe Ltd i DSPLib For SH Application Note (19-031/1.0) Contents TABLE OF CONTENTS PREFACE ................................................................................................................................................i 1.0 OVERVIEW......................................................................................................................................1 1.1 GENERAL DESCRIPTION .......................................................................................................................1 1.1.2 Note About Data Types ................................................................................................................2 1.2 FUNCTION TIMINGS .............................................................................................................................3 1.2.1 FFT Timings ................................................................................................................................3 1.2.2 Filter Timings ..............................................................................................................................3 1.2.3 Convolution & Correlation Timings ............................................................................................4 2.0 FAST FOURIER TRANSFORMS (FFT).......................................................................................5 2.1 FFT BACKGROUND THEORY ...............................................................................................................5 2.1.1 Not In Place FFT .........................................................................................................................6 2.1.2 In-Place FFT ...............................................................................................................................7 2.1.3 FFT Scaling .................................................................................................................................8 2.1.4 Input / Output Array Format .......................................................................................................8 2.2 ROUTINE DESCRIPTIONS ......................................................................................................................9 2.2.1 FftReal .........................................................................................................................................9 2.2.2 FftComplex ................................................................................................................................13 2.2.3 IfftComplex ................................................................................................................................14 2.2.4 IfftReal .......................................................................................................................................16 2.2.5 FftInComplex .............................................................................................................................17 2.2.6 FftInReal....................................................................................................................................17 2.2.7 IfftInComplex .............................................................................................................................18 2.2.8 IfftInReal....................................................................................................................................18 3.0 WINDOW FUNCTIONS................................................................................................................21 3.1 FUNCTION DESCRIPTIONS ..................................................................................................................21 3.2 USING WINDOW FUNCTIONS ..............................................................................................................24 3.3 FFT APPLICATION OF WINDOW FUNCTIONS ......................................................................................24 4.0 FILTERS .........................................................................................................................................31 4.1 FILTER BACKGROUND THEORY .........................................................................................................31 4.1.1 Finite Impulse Response Filters (FIR).......................................................................................32 4.1.2 Infinite Impulse Response Filters (IIR)......................................................................................33 4.1.3 Filter Specification ....................................................................................................................35 4.2 ROUTINE DESCRIPTIONS ....................................................................................................................37 4.2.1 Fir ..............................................................................................................................................37 4.2.2 Fir1 ............................................................................................................................................39 4.2.3 Iir ...............................................................................................................................................41 4.2.4 Iir1 .............................................................................................................................................43 4.2.5 DIir ............................................................................................................................................44 4.2.6 DIir1 ..........................................................................................................................................49 4.2.7 Lms.............................................................................................................................................50 4.2.8 Lms1...........................................................................................................................................57 5.0 CONVOLUTION AND CORRELATION ...................................................................................59 5.1 BACKGROUND THEORY .....................................................................................................................59 5.1.1 Convolution ...............................................................................................................................59 5.1.2 Correlation ................................................................................................................................60 5.2 ROUTINE DESCRIPTIONS ....................................................................................................................61 5.2.1 ConvComplete............................................................................................................................61 5.2.2 ConvCyclic.................................................................................................................................62 Hitachi Micro Systems Europe Ltd ii DSPLib For SH Application Note (19-031/1.0) Contents 5.2.3 ConvPartial................................................................................................................................64 5.2.4 Correlate....................................................................................................................................65 5.2.5 CorrCyclic .................................................................................................................................67 5.3 CORRELATION EXAMPLE ...................................................................................................................68 6.0 MISCELLANEOUS FUNCTIONS ...............................................................................................71 6.1.1 GenGWnoise ..............................................................................................................................71 6.1.2 MatrixMult.................................................................................................................................72 6.1.3 VectorMult .................................................................................................................................73 6.1.4 MsPower ....................................................................................................................................74 6.1.5 Mean ..........................................................................................................................................74 6.1.6 Variance.....................................................................................................................................75 6.1.7 MaxI...........................................................................................................................................76 6.1.8 MinI ...........................................................................................................................................76 6.1.9 Peakl ..........................................................................................................................................77 7.0 DSPLIB IMPLEMENTATION .....................................................................................................79 7.1 HARDWARE .......................................................................................................................................79 7.1.1 Analogue to Digital Conversion ................................................................................................80 7.1.2 Digital To Analogue Conversion ...............................................................................................81 7.1.3 Circuit Description ....................................................................................................................82 7.2 SOFTWARE .........................................................................................................................................84 7.2.1 Main() ........................................................................................................................................84 7.2.2 system() ......................................................................................................................................86 7.2.3 adc(short*).................................................................................................................................87 7.2.4 detect(short*) .............................................................................................................................88 7.2.5 dac(short*).................................................................................................................................89 7.3 Timing...........................................................................................................................................90 APPENDIX ONE ..................................................................................................................................92 ENSIGMA SOFTWARE GUIDE FOR DSPLIB ...............................................................................................92 Hitachi Micro Systems Europe Ltd iii DSPLib For SH Application Note (19-031/1.0) Overview 1.0 Overview 1.1 General Description DSPLib represents a number of standard `off the shelf' fixed point DSP operations which can be used either stand alone or in series to form an optimised DSP process which will run on a microcontroller. The functions offered by the library come in five distinct groups. Fast Fourier Transforms Window Functions Filters Convolution & Correlation Miscellaneous The library offers each function in both ANSI C source code and also optimised SH assembler. The C code is provided only for debugging and reference. It is not recommended that the C code be used in a finished system due to the inefficiencies introduced by the C compiler. As well as the DSP functions, DSPLib offers a number of example programs to demonstrate to the user how easily DSPLib can be integrated into standard ANSI C code. In order to reduce development times, this document has been written to take the reader through each function indicating typical pitfalls, as well as hints to improve execution speed. The document forms a type of user guide for the main part, and towards the end indicates some hardware design considerations as well when designing with the Hitachi SH series. All the functions offered by DSPLib do not use any local static variables, thus enabling the functions to be called from interrupt routines without affecting the operation of the main program. In order to use DSPLib with the Hitachi SH series of microcontrollers, the user will require the Hitachi tool chain to build programs. This tool chain consists of the following :SH Series Cross Assembler SH Series C Compiler H Series Linkage Editor Hitachi Micro Systems Europe Ltd 1 DSPLib For SH Application Note (19-031/1.0) Overview Although a set of GNU tools are available for the Hitachi SH series, the optimised SH assembler in DSPLib is not compatible, and is therefore unsuitable. The ANSI C code can be compiled by the GNU tool, but will result in a very inefficient implementation. DSPLib is supplied on a 3.5" disk in MS-DOS format. The disk consists of a file ensigma.lib which contains the entire library ready built in object form. Also on the disk is a header file called ensigdsp.h, as well as the ANSI C and optimised assembler functions. Some example code is also supplied on the disk, details of which may be found in the related Ensigma documentation. A number of test routines are included to allow the user to verify correct operation. The directory structure of DSPLib is as follows :- When installing DSPLib, create a directory for the above tree to sit in. The Hitachi tools should be installed in a separate directory, and the path then set such that the include and lib directories of DSPLib are visible when building programs. To copy the library onto a hard drive called c:\, place DSPLib into drive A:\ and type the following :XCOPY A:\* C:\shc\shc2.0 /s Further details on installation and library building are included in the related Ensigma documentation. 1.1.2 Note About Data Types DSPLib makes use of a number of ANSI C data types. Representation of these types on an SH-1 device are as follows :Byte 8 bits Word 16 bits Long 32 bits Float 32 bits Double 64 bits Hitachi Micro Systems Europe Ltd 2 DSPLib For SH Application Note (19-031/1.0) Overview 1.2 Function Timings The execution times are based on a 20Mhz SH1 device, and should be adjusted accordingly for other clock speeds. 1.2.1 FFT Timings The timings are given below in milliseconds for the execution speed of the FFT functions. Size 128 256 512 1024 Not In Place Forward FFT Inverse FFT Complex Real Complex Real 1.94 1.06 2.19 1.20 4.43 2.37 4.94 2.66 9.98 5.29 11.00 5.88 22.23 11.70 24.28 12.88 Size 128 256 512 1024 In Place Forward FFT Inverse FFT Complex Real Complex Real 1.94 1.06 2.20 1.19 4.46 2.37 4.97 2.64 10.03 5.32 11.06 5.86 22.39 11.75 24.44 12.83 1.2.2 Filter Timings The timings are given below in microseconds for the execution speed of the filter functions. The multiple sample routines are quoted in a per sample format, computed by dividing the computational time for 100 samples by 100. No. Of Coeffs 5 10 100 No. Of Biquads 1 2 20 FIR 8.6 10.3 46.1 Function LMS FIR1 24.9 17.2 40.6 18.9 328.4 54.7 LMS1 34.7 50.4 338.2 IIR Function DIIR IIR1 DIIR1 40.1 77.6 752.8 48.0 86.4 784.3 10.5 19.7 185.5 18.1 28.8 221.4 Hitachi Micro Systems Europe Ltd 3 DSPLib For SH Application Note (19-031/1.0) Overview 1.2.3 Convolution & Correlation Timings The timings are given below in microseconds for the execution speed of the various convolution and correlation functions offered by DSPLib. The timings are quoted per output sample, thus found by dividing the time to produce n output samples by n. The table shows the computational time per output sample for a particular input array size iw. It should be noted that iw is always the smaller of the two input arrays. Size Of iw 5 10 100 ConvPartial 9.7 10.6 54.6 Routine ConvCyclic CorrCyclic 9.7 11.2 55.4 Correlate 9.0 10.2 47.7 8.1 9.0 46.4 NOTE :- Special Case In certain situations, computation takes place outside the input array boundaries. When this happens, the results formed from elements outside the boundary are not calculated. The effect of this is that the computational time decreases. An example of such a situation is when the two input arrays are the same size. When this happens, there is only one instance when the solution is formed entirely from elements inside the boundary. For all other instances, the number of computations per sample reduce. Computation outside the array boundaries can take place in the ConvComplete and Correlate functions only. Taking the extreme case, when both the input arrays are the same size, the timings per sample are given below. Once again, the timings were found by taking the time required to produce n output samples, and dividing by n. All timings quoted below are given in microseconds. Size Of iw 5 10 100 No. Of Output Samples 9 19 199 Routine ConvComplete Correlate 6.5 7.1 29.0 8.4 8.2 26.1 Hitachi Micro Systems Europe Ltd 4 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms 2.0 Fast Fourier Transforms (FFT) Introduction The Fourier Transform provides a method of performing frequency analysis on a given time domain signal. The FFT of a continuous time domain signal returns the corresponding frequency domain equivalent. The use of a ready designed software routine releases the applications engineer from the burden of complicated mathematics involved with spectrum analysis. As a result of this, the designer is able to concentrate directly on the development of the specific application. Real world applications of the Fourier Transform are common place, and include such examples as noise analysis (e.g. analysing E.M.C. compliance), spectrum estimation as well as numerous telecommunication processes. DSPLib provides routines which calculate the classical forward and inverse FFTs in Radix 2 form. Each routine is described in detail below, with examples of operation given where necessary. 2.1 FFT Background Theory The purpose of this section is to reiterate some basic FFT theory such that the reader will gain an appreciation of the function of each of the software routines described in this chapter. DSPLib provides basic building blocks that allow the calculation of Radix two decimation in time transforms. Radix two implies that the time domain input data is required in blocks that are a power of two. This means that an FFT cannot be performed on a sample by sample process, but instead must be presented in blocks of samples. The choice of size of the block of time domain samples presented to the FFT is determined by a number of factors such as:Frequency Content Of Signal Required Resolution In Frequency Domain Processor Time Available To Perform FFT Must Be Power Of Two (Radix Two) It is necessary for the designer to examine each of the above considerations and reach a favourable compromise in order to implement an efficient FFT for a specific application. Although in-depth mathematical knowledge of the FFT is not required to use theDSPLib FFT functions, an appreciation of the structure is desirable, thus enabling the designer to understand the required formats for data input and output, as well as understanding the results generated by the FFT. Hitachi Micro Systems Europe Ltd 5 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms The forward Fourier transform is defined by the following equation:N y(n) = 2 - s e -2 jn / N . x n n=0 Conversely the inverse transform may be written as :N y(n) = 2 - s e 2 jn / N . x n n=0 The exponential powers shown are known as the twiddle factors and are used to combine two data points within the FFT. To improve on computational time the twiddle factors are computed first and stored in a look-up table. It is desirable to ensure that the look-up table is stored in internal RAM in order to minimise memory access times and hence improve FFT performance. In the case of DSPLib the look-up tables are generated by executing the InitFft routine first. DSPLib offers two distinct structures of FFT. Both will return bit identical results, and both have near identical execution speeds, however the use of memory differs for the storage of results. 2.1.1 Not In Place FFT This version of the FFT takes the block of time domain samples from RAM, performs the FFT and returns the result to a new location in RAM. The structure of a not-in-place FFT can be seen in diagram 2.1 below. T im e D o m a in S a m p le s x (0 ) W0 F r e q u e n c y D o m a in S a m p le s X (0 ) W0 W0 x (4 ) X (1 ) x (2 ) X (2 ) W0 W1 W0 X (3 ) x (6 ) x (1 ) X (4 ) W2 x (5 ) W2 W0 X (5 ) x (3 ) x (7 ) X (6 ) W0 W2 W3 X (7 ) Diagram 2.1 : Not In-Place FFT Flowgraph Hitachi Micro Systems Europe Ltd 6 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms It should be noted from Diagram 2.1 that the inputs to the FFT are not in order. In order for the FFT to be performed, the data order must be re-arranged by way of bit reversal. Since this operation is performed by the SH, the bit reversal technique is implemented in software (unlike some DSP cores which have bit reversed addressing to enhance execution speed) by the library code, and is of no concern to the user. The output of the FFT is in order, and so no further modifying of the addresses of the samples is necessary. The weightings applied at each of the butterflies on the Flowgraph are the twiddle factors which were described earlier. The twiddle factors are stored in a look-up table in order to enhance speed. 2.1.2 In-Place FFT This version takes the block of time domain samples in memory, performs the FFT operation and deposits the results into the same memory locations in RAM. This method has the advantage of using less memory space (50% less), at the expense of execution time. The hardware arrangements for in-place FFTs require careful planning, since for each sample that is processed there are two memory accesses, a read and a write to the same location. To maximise speed, on chip RAM should be used. x(0) Stage 3 Frequency Domain Samples X(0) x(1) W0 Time Domain Samples Stage 1 Stage 2 x(2) W0 x(3) W0 x(4) W0 x(5) W0 x(6) W0 W2 x(7) W0 W2 X(4) X(2) W2 X(6) X(1) W1 X(5) X(3) W3 X(7) Diagram 2.2 : In-Place FFT Flowgraph As can be seen in the above flow diagram, bit reversal is still necessary with in-place FFTs. The above diagram shows bit reversal is necessary on the output data. In reality, the difference in execution time between the two functions is negligible. Therefore, the only real consideration when deciding on the implementation is the amount of on chip RAM that is likely to be needed. For maximum conservation of memory space, use the in-place implementation. Hitachi Micro Systems Europe Ltd 7 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms As with any process where time is important, it is recommended that the whole of DSPLib is executed from on chip RAM. 2.1.3 FFT Scaling When implementing an FFT, it is important to have an understanding of the signal being presented to it such that overflow does not occur at any stage. There are a number of different methods of preventing overflow, all based around the scaling of the input data. When deciding on the scaling applied, it is important to consider the quantisation introduced against the likelihood of overflow. The related Ensigma text describes in detail the method of scaling employed by DSPLib, and should be carefully considered before implementing a transform. For most applications the scaling EFFTALLSCALE (0xFFFFFFFF) will be suitable. This level of scaling simply performs halfing on every stage, with a guarentee that no overflow will occur. To summarise FFT scaling, here are the three levels offered by DSPLib. EFFTALLSCALE :- Halving performed on every stage, guaranteeing no overflow if the input samples are all less than 230 . EFFTMIDSCALE :- Halving performed on alternate stages, hence giving an output with the same power as the input. EFFTNOSCALE when using this. :- No scaling performed. Signal properties must be well understood 2.1.4 Input / Output Array Format DSPLib offers both real and complex FFT computation, and so the format and ordering of data arrays presented to the transform must be understood. Firstly, we will look at purely real arrays, as used in all the real forward and inverse transforms. The order of the array is as follows:x[2n] = Re{C(n)} x[2n+1] = 0 i.e. all elements are real on input all imaginary components are zero (real signal) For a real transform, the input array contains real time domain samples and so contains no phase information. The output from the transform is the frequency domain representation of the signal. The output of the FFT is in complex array format (see below), however, in the case of a real signal all the complex samples are zero. Each real sample represents the signals frequency content at a certain point. The highest sample, i.e. the last sample represents Fs/2. I.e. the FFT only returns frequency data for half the sampling frequency. Hitachi Micro Systems Europe Ltd 8 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms The complex transforms require data to be presented in complex array format, and similarly the results are returned in the same format. The ordering of a complex array is as follows:x[2n] = Re{c(n)} x[2n+1] = Im{c(n)} Real elements Imaginary elements I.e. the input and output elements are ordered :- {real, imaginary, real, imaginary ......} 2.2 Routine Descriptions 2.2.1 FftReal This function performs a purely real forward fast fourier transform. For a full description of the definition of the function, the Ensigma documentation should be consulted. The routine is defined as follows :int FftReal( output, input, size, scale) Where, short output[] short input[] long size long scale positive frequency output data real input samples size of FFT scaling applied to FFT Using FftReal The use of FftReal, as with any other function in DSPLib is relatively straight forward. The user must define the scaling and size of the function, as well as present input data in a suitable format. This example, and others in this section use the LogMagnitude function from DSPLib to present the output of the FFT in log magnitude format. Although the use of this function is included in the text, further details of this function are included in the Ensigma documentation. Example The following example outlines a simple application of the FftReal function, intended as a starting point for the reader when developing more specific applications. In this example, the signal applied to the FFT contains two frequency components with different amplitudes. The FFT is used to determine the frequency and amplitude of the components of the signal. Hitachi Micro Systems Europe Ltd 9 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms Data presented to the FFT must be 16 bits or less(including the sign bit), and in blocks that are a power of two (radix two operation). In our case, the input consists of 128 samples. The output will consist of 64 samples interlaced with zeros, confirming that the input is real, since all complex samples are null. It is necessary for the user to determine the size of the FFT, and also the scaling applied during calculation. Since there are 128 input samples a 128 point FFT is required for a single block computation. Referring to the FFT structure in diagram 2.1, the required size of FFT becomes apparent. In the diagram, an eight point FFT is shown, and is capable of processing data in blocks of eight. Therefore, in our case, the 128 input samples can be processed in one block using a 128 point FFT. The C code implementation is shown below, and shows how DSPLib is combined in a standard ANSI C program. The array INPUT has been created from an earlier source, either an ADC or generated by the SH in a routine. The samples in INPUT are transformed and written to the array OUTPUT. #include #include #include /* Typical Libraries To Be Included */ #define NSAM 128 #define NSAMS NSAM/2 /* Number Of Input Samples*/ /* Number Of Output Samples From LogMagnitude */ void main(void) { short OUTPUT[NSAM], LOGOUT[NSAMS]; float fscale; fscale = 1; /* Enter Code Here To Generate INPUT Data */ /* Generate Twiddle Factor Look-Up Tables */ if(InitFft (NSAM) != EDSP_OK) printf("Problem With Generating Look-Up Table"); /* Perform FFT Operation */ if(FftReal (OUTPUT, INPUT, NSAM, EFFTALLSCALE) != EDSP_OK) printf("Problem With Performing FFT Operation"); /* Log Magnitude Results Returned By FFT */ if(LogMagnitude( LOGOUT, OUTPUT, NSAM, fscale) != EDSP_OK) printf("Problem With LogMagnitude Routine"); } Hitachi Micro Systems Europe Ltd 10 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms The above example uses a total of three functions from DSPLib, and is centred around the FftReal function described above. InitFft calculates the twiddle factors and stores them in a look-up table which is used by FftReal during its execution. More details of InitFft can be found in the Ensigma documentation. LogMagnitude takes the complex output data from FftReal, and returns a purely real logarithmic result. The scale in LogMagnitude may be set to suit the application, and in this case is set to one. LogMagnitude can be found in the Ensigma text. Diagram 2.3 below shows a plot of the 128 samples contained in the array INPUT. The FFT processed these samples and returned the results to OUTPUT. Following the completion of FftReal, LogMagnitude calculated the log format of the results, and this can be seen in diagram 2.4 below. 1500 1000 500 0 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 -500 -1000 -1500 Diagram 2.3 :- 128 Time Domain Input Samples 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 2.4 :- Result Of FFT (64 Frequency Domain Samples) Hitachi Micro Systems Europe Ltd 11 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms The result of the FFT is shown in diagram 2.4 above. There are only two samples which are not zero, and these represent the two frequency components of the signal shown in diagram 2.3. It should be noted that the result of LogMagnitude is half the length (64 samples) of the original FFT result (128 samples). This is due to the fact that the real and imaginary samples are squared, added together and then logged. Analysis of the above plot in diagram 2.4 will give us detailed information on the spectrum of the signal in diagram 2.3. The plot is analysed as follows :The two peaks appear at samples 4 and 16. Since sample 64 is known to correspond to Fs/2 (half the sampling frequency), other samples may be calculated as follows to form a table :Frequency Fs/2 Fs/4 Fs/8 Fs/16 64/1 64/2 64/4 64/16 Sample 64 32 16 4 Hence, from the above table we know that our signal contains two components, one at Fs/4 and the other at Fs/16, which correspond to samples 16 and 4. Next it is necessary to determine the amplitude of those components. Fs/16 and Fs/8 have amplitudes of 20dB and 40dB respectively. Since the log of a signal is 20 logVpk, where Vpk is the peak amplitude of the signal, the original signal amplitudes may be found as follows:Amplitude of Fs/16 = anti-log (20/20) = 10 Also Amplitude of Fs/8 = anti-log (60/20) = 1000 So the result of our FFT process is the complete frequency content of the original signal up to and including Fs/2. Clearly higher in the spectrum is of no concern to us due to the aliasing effects of digitisation. The method for determining the amplitude of the components shown above is only applicable if EFFTALLSCALE is used, and if the scale in LogMagnetude is set to one. For most applications, this arrangement will be satisfactory and provide the simplest implementation. This example has omitted the use of windows when taking FFTs. Windows are used to compensate for the `end effect' encountered when taking the transform of a finite data set. The sudden end of the data generates many harmonics which strictly speaking are not present in the signal. The result of this is that the FFT represents frequency components which are of no interest to the user. Hitachi Micro Systems Europe Ltd 12 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms In order to counter effect this, the data is applied through a window, which has the effect of `fading' out the start and finish, thus reducing harmonics caused by straight edges. There are a number of different window shapes available, and the omittance of a window is referred to as rectangular windowing, and applies to the above example. DSPLib provides four windows, being Blackman, Hamming, Hanning and Triangle. These functions and their use are described in section 1.3.2 . 2.2.2 FftComplex This routine allows the user to present both real and complex data to the FFT for processing. The result of the FFT is in the same format as that in FftReal, only the complex values will be non-zero(assuming the complex input data is non-zero). The use of the routine is very similar to FftReal, and follows on directly from its description. It is recommended that the reader looks at the FftReal text in the previous section before continuing here. The complex FFT is essentially two separate FFTs, one for the complex data and one for the real. These FFTs are completely separate, and do not interact in anyway. FftComplex is not-in-place, and hence returns its solution to a different memory location to that of the input data. Diagram 2.1 shows the structure of a not-in-place FFT. The related Ensigma documentation gives detailed information regarding the definition of FftComplex, however there now follows a brief summary of the function for your reference. The routine is defined as follows :int FftComplex( output, input, size, scale) Where, short output[] short input[] long size long scale Complex Output Data Complex Input Data Size Of FFT Scaling Applied To FFT Use Of FftComplex FftComplex is essentially the same to use as FftReal, and so the previous example is applicable here. The difference between the two functions is that FftComplex allows complex data to be processed. Therefore, the input and output arrays are both in complex array format, and so the user should ensure familiarity with this. Hitachi Micro Systems Europe Ltd 13 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms When specifying the size of the FFT, it is important to remember that half the inputs are complex. Therefore, if the user has 128 real samples and 128 complex samples, and it is necessary to process the samples in one block, a 256 point FFT will be required. A block diagram representation of the above instance is shown in diagram 2.5. The diagram shows how two N point FFTs are required to process the real and complex data samples. Typically the complex samples are representations of the signals phase characteristics, but as explained below, this doesn't have to be the case. Complex FFT Real Input Samples x0,x2,x4,x6,....xn ... . N Point FFT ... . Real Output Samples y0,y2,y4,y6,....yn Imaginary Input Samples x1,x3,x5,x7,....xn+1 ... . N Point FFT ... . Imaginary Output Samples y1,y3,y5,y7,....yn+1 Diagram 2.5 :- Representation Of FftComplex Showing Two Separate FFTs As the above diagram shows, the FftComplex routine is actually two separate interlaced FFTs. This fact can be exploited by way of performing two real FFTs simultaneously. Real data can be applied to both the real and complex inputs, with the corresponding results available in the odd and even samples at the output. In certain applications, this method can offer time savings over performing two separate FftReal routines. 2.2.3 IfftComplex This function offers the exact opposite to FftComplex. IfftComplex calculates the inverse FFT and returns a not-in-place solution. The user presents the routine with a set of frequency domain samples in complex array format. The highest sample in the array must correspond to the frequency Fs/2s' imaginary component. If the signal is real, all the imaginary values (odd addresses) will be zero. The transform will return the equivalent time domain signal for the given frequency data. The time domain data will be in complex array format, and will require separating into real and imaginary if necessary. When applying data to IfftComplex, it is important that the samples are not in logmagnitude form, since this will generate an invalid return on the output. As with all the FFT routines in DSPLib, the user must specify scale and size. Full details of the routines description are in the related Ensigma documentation. Hitachi Micro Systems Europe Ltd 14 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms The routine definition is summarised below for your reference:- int IfftComplex( output, input, size, scale) Where, short output[] short input[] long size long scale complex output data complex input data size of inverse FFT scaling applied during IFFT execution Use Of IfftComplex The routine is very simple to use, and as described above is the exact opposite to the FftComplex routine described. One method of establishing correct operation is to perform both the forward and inverse transforms on a data set, then comparing the results to the original data input. If the results are different, there is a problem with one or both of the functions, such as overflow, excessive quantisation or simply scaling. An example of such a program is given below. #include #include #include /* Typical Libraries To Be Included */ #define NSAM 128 #define NSAMS NSAM/2 /* Number Of Input Samples*/ /* Number Of Output Samples From LogMagnitude */ void main(void) { short OUTPUT[NSAM], OUTPUTA[NSAM], LOGOUT[NSAMS]; float fscale; int I; fscale = 1; /* Enter Code Here To Generate INPUT Data */ if(InitFft (NSAM) != EDSP_OK) printf("Problem With Generating Look-Up Table"); if(FftComplex (OUTPUT, INPUT, NSAM, EFFTALLSCALE) != EDSP_OK) printf("Problem With Performing FFT Operation"); if(IfftComplex (INPUTA, OUTPUT, NSAM, EFFTALLSCALE) != EDSP_OK) printf("Problem With Performing IFFT Operation"); Hitachi Micro Systems Europe Ltd 15 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms for(i=0; I < NSAM; I++) { if((INPUT[i] - INPUTA[i]) != 0) printf("Problem With Format Of Results"); } } The above C code shows how both FftComplex and IfftComplex are used as functions in a program. The code performs a complex forward not-in-place FFT on the data stored in INPUT. The resultant frequency domain data then has an inverse FFT applied and the result stored in INPUTA. If both routines are operating correctly, the values stored in INPUTA should be identical to INPUT. This is tested by subtracting the two and comparing to zero. The result is compared to zero, and if different, an error message is displayed to indicate to the user that the functions are not operating correctly. The reader should note the use of the same scaling applied to both forward and inverse transforms. As discussed above, a number of factors can lead to the data being different, and most can be `ironed' out by the user during development. 2.2.4 IfftReal IfftReal presents a function which calculates the inverse not-in-place FFT. This function follows on from the above description of IfftComplex, with the exception of the format of the returned data. The user presents the routine with complex frequency domain samples, with the highest sample corresponding to Fs/2. The inverse transform then returns a purley real time domain representation of the input data. Further details regarding the definition of the function may be found in the related Ensigma documentation. For your reference the definition of the function is as follows:int IfftReal( output, input, size, scale) Where, short output[] short input[] long size long scale real output data complex input data inverse FFT size scaling applied Using IfftReal When using IfftReal, the input must be in complex array format. However, the signal returned from the function must be real, and therefore the input must represent a real signal. Therefore, the complex input array must represent a positive frequency, and so all complex array elements must be zero. Before calling this routine the twiddle factors should be generated by calling InitFft. Hitachi Micro Systems Europe Ltd 16 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms 2.2.5 FftInComplex This routine performs the same function as FftComplex which was described earlier in this section. The user should choose this routine when memory space is of the essence. FftInComplex calculates the complex forward FFT in-place. The flowgraph for an inplace FFT can be seen in diagram 2.2. The function is described in detail in the related Ensigma documentation. For your reference, the definition of the function is as follows:int FftInComplex( data, size, scale) Where, short data[] complex input and output data long size FFT size long scale FFT scaling applied Using FftInComplex As described this function is identical in operation to FftComplex except for the use of memory. The function takes the input data from memory (stack) performs the calculations, then returns the results to the SAME memory location. This enables the user to maximise the use of memory space, but at the expense of overwriting the input data. 2.2.6 FftInReal Not surprisingly this follows the same format as FftReal, except that the FFT is performed in-place. The user should refer to the section on FftReal for details of operation when using this routine, and also reference to the related Ensigma documentation would be useful. For your reference the definition of FftReal is as follows:int FftReal( data, size, scale) Where, short data[] real data input, complex data output long size FFT size long FFT scaling applied Using FftInReal The input to the FFT is real data only, where as the output is in complex array format with the complex array elements equal to zero. To allow the FFT to be calculated inplace, the Fs/2 frequency output is stored in the second element of the array. Hitachi Micro Systems Europe Ltd 17 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms This slightly complicates the output, and should be remembered when using. For more details consult the Ensigma documentation. 2.2.7 IfftInComplex This routine represents an in-place implementation of the complex inverse FFT. Its operation is essentially the same as IfftComplex, only with an in-place computation. As with all of the in-place routines, IfftInComplex should be chosen when memory usage is critical. The routine is described in the related Ensigma documentation on. However, for your reference, here is the definition of IfftInComplex:int IfftInComplex( data, size, scale) Where, short data[] complex input/output data long size FFT size long scale FFT scaling applied Using IfftInComplex Operation is identical to the IfftComplex routine described only the results of the inverse FFT are written to the same memory location as the input. Hence all input data is overwritten, but memory usage is optimised. As with any of the DSPLib routines, it is desirable to locate data in on chip memory. 2.2.8 IfftInReal This final routine in the FFT section implements a real inverse in-place FFT. Operation is identical to IfftReal described earlier in this section, with the exception of the in-place computation returning the results to the same memory location as the inputs. Full details of IfftInReal can be found in the related Ensigma documentation. For your reference the function is defined as follows:int IfftInReal( data, size, scale) short long long data[] Complex positive frequency data input, real data output size Inverse FFT size scale FFT scale applied Where, Using IfftInReal As described, this method offers in-place inverse real FFT calculation. For details on the use of this function refer to the section detailing IfftReal. Real output data is inplace and so overwrites the original input data. Use of this function should be limited to applications where memory usage must be optimised. Hitachi Micro Systems Europe Ltd 18 DSPLib For SH Application Note (19-031/1.0) Fast Fourier Transforms When applying complex data to IfftInReal, the Fs/2 value should be placed in the second element of the array where the imaginary component of zero would normally be stored. Although this might complicate matters some what, it enables the function to operate in-place. Summary This section is intended to give the reader an insight into the operation of each of the FFT functions. While keeping the descriptions as general as possible to suit most applications, the descriptions should enable the user to choose between functions when designing a particular application, as well as pointing out certain design considerations. The detail and theory in this document is not intended to form a reference for FFT design, but should instead provide a starting point from where further DSP research may begin. NOTE It is important to remember to initiate the twiddle factor tables before using any of the FFT routines. These tables are initialised by using the function InitFft in the DSPLib. Details regarding both this function and the logmagnitude function can be found in the related Ensigma documentation. Hitachi Micro Systems Europe Ltd 19 DSPLib For SH Application Note (19-031/1.0) Window Functions 3.0 Window Functions Introduction This section of DSPLib provides four window generators which may be used for a number of applications, but essentially are used for counter-effecting the `end-effect' experienced when taking the FFT of a finite data length. Each of the four routines provide efficient methods of calculating the window, and deposit the results into a user specified memory location. The particular window is then recalled when required and applied accordingly to the desired data. The four windows are described below, along with plots of their associated shapes. Since a window is a continuous signal between limits, it is necessary to divide the window into equally spaced samples. The number of samples used to represent the window is specified by the user when calling the function. Clearly the more samples used, the greater the accuracy of representation. 3.1 Function Descriptions The window generators in DSPLib are defined as follows :int GenBlackman( output, win_size) int GenHamming(output, win_size) int GenHanning(output, win_size) int GenTriangle(output, win_size) Where, short long output[] win_size array to containing window coefficients size of window N and length of output[] Each function will return EDSP_BAD_ARG if the user specifies a window size that is smaller than one. If the function executes correctly, the flag EDSP_OK is returned. These flags are intended for debug use, and will help the user identify problems in code development. The functions generate a window to a size specified by the user. The shape of the generated window can be seen in respective diagrams below. Each diagram shows the window when size was selected as 128. Therefore, 128 samples (or window coefficients) are used to represent the windows. Hitachi Micro Systems Europe Ltd 21 DSPLib For SH Application Note (19-031/1.0) Window Functions 35000 30000 25000 20000 15000 10000 5000 0 1 9 17 25 33 41 49 57 65 73 81 89 97 105 113 121 Diagram 3.1 :- Blackman Window The Blackman window for 128 samples is shown above in Diagram 3.1. It can be seen that the side lobes actually extend right down to zero, as well as having a wide centre lobe. Mathematically, the Blackman window is defined as follows:2n 4n w(n) = (215 - 1) 0.42 - 0.5 cos + 0.08 cos 0n #include #include #define WINLEN 128 /* Typical libraries */ /* MUST be included */ /* Define length of window */ void main(void) { short output[WINLEN]; /* Define array to contain window */ if(GenBlackman( output, WINLEN) != EDSP_OK) /* Generate window */ printf(" Problem Generating Blackman Window"); /* Report if error has occurred */ } When the above code is executed, the window values are stored in an array called output[] of length 128. Although there is not an upper limit to the size of window generated, care should be taken to ensure that the stack is sufficintly large to accomodate all the generated values, as well as other values being used in the programs execution. The execution speed of the routines is of less importance than other routines described in DSPLib. The window function is typically used to generate values which are stored in a look up table for use in a real time loop. Therefore an optimised version is not available in DSPLib. The running time of the routine can be improved by the removal of the argument checking section of the code. This is only recommended once the specified arguments are known to be correct and generate the desired output. 3.3 FFT Application Of Window Functions Before reading this section, the reader should ensure familiarity with the FFT functions offered by DSPLib. Section two of this document will provide the necessary background material relating to this example. As described earlier, window functions have a number of different uses ranging from filter design to FFT pre-processing. It is the latter that is discussed here. Hitachi Micro Systems Europe Ltd 24 DSPLib For SH Application Note (19-031/1.0) Window Functions The FFT of a finite data set generates harmonics which are not related to the actual signal, and so leads to false solutions. This effect is caused by the sudden end of the input data, resulting in straight edges at either end. As familiarity with Fourier series representation would confirm, a great number of harmonics (infinite for exactly vertical) are required to represent a straight edge. It is these harmonics which are quite correctly represented by the FFT. 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 3.5 :- Desired FFT Result Diagram 3.5 shows the desired result from the application of an FFT to a particular signal. The signal in question has two frequency components, one at Fs/4 and the other at Fs/32. These two frequencies correspond to samples 32 and 4 respectively. With the introduction of a finite data set, with a sudden end to the samples corresponding to a straight edge, the resulting FFT is as shown in Diagram 3.6 below. Clearly a great number of harmonics have been generated which spread right across the spectrum of interest. The two frequency components are visible, but are completely encompassed by harmonics. 70 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 Diagram 3.6 :- Result Of FFT On Finite Data Length Hitachi Micro Systems Europe Ltd 25 DSPLib For SH Application Note (19-031/1.0) Window Functions In order to counter effect the harmonics, the finite time domain signal is multiplied by a window of the same length as the data set. The result of this operation is shown below. The effects of different windows are shown, with varying degrees of success. The C code implementation of this example is given at the end of this section. 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 3.7 :- Result Of Blackman Windowing 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 3.8 :- Result Of Hamming Windowing (Continued Over) Hitachi Micro Systems Europe Ltd 26 DSPLib For SH Application Note (19-031/1.0) Window Functions 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 3.9 :- Result Of Hanning Windowing 60 50 40 30 20 10 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 -10 Diagram 3.10 :- Result Of Triangular Windowing As can be seen in Diagram 3.9, the Hanning window generates the best result. The window has removed most of the harmonics, leaving a clear frequency representation of the finite data. In most applications, this type of `trial and error' approach will suffice. The C code implementation is as follows :/** /** DESCRIPTION: This file Generates a sine wave, applies a Hamming window /** applies an FFT and then returns the solution in log format #include #include #include #include "testhead.h" Hitachi Micro Systems Europe Ltd 27 DSPLib For SH Application Note (19-031/1.0) #define NSAM #define NSAMS #define TWOPI 128 NSAM/2 6.283185307 Window Functions /* number of samples */ /***************** Exercise To Take The FFT Of Data *******************/ void main(void) { /** Declare Local Variables **/ short int float sinea1[NSAM], window[NSAM], signal[NSAM], sinea[NSAM], sineb[NSAM],fftouta[NSAM],outputa[NSAMS],outputb[NSAMS]; n, k, len, res_shift; scale, scalefft; scale res_shift len scalefft = = = = 1; 15; NSAM; EFFTALLSCALE /*** Initiate Twiddle Values ***/ if(InitFft (MAXSIZE) != EDSP_OK) printf("Problem With Initialisation Of FFT"); /*** Generate Sine Wave With Sudden End***/ if(sine == 0) { printf("Generating Sine Wave"); k = len / 64; for(n = 6; n < 122; n++) { sinea1[n] = floor(1000 * sin(TWOPI * k * n / len) + 0.5); } k = len / 4; for(n = 6; n < 122; n++) { sineb[n] = floor(1000 * sin(TWOPI * k * n / len) + 0.5); } printf("Done"); } /*** Add Sine waves ***/ for(n = 6; n < 122; n++) { sinea[n] = sinea1[n] + sineb[n]; } for(n = 0; n < 6; n++) { sinea[n] = 0; } for(n = 122; n < 128; n++) { sinea[n] = 0; } /*** Generate Windows ***/ if(GenTriangle( window, NSAM) != EDSP_OK) printf(" Problem With Window Generator "); /** Multiply Window With Signal **/ if(VectorMult( signal, window, sinea, NSAM, res_shift) != EDSP_OK) printf("Problem With Vector Mult"); /*** Take FFTs Of Sine Waves ***/ if(FftReal( fftouta, signal, NSAM, scalefft) != EDSP_OK) printf("Problem With FFT"); Hitachi Micro Systems Europe Ltd 28 DSPLib For SH Application Note (19-031/1.0) Window Functions /** Covert FFT result to log magnitude format **/ if(LogMagnitude( outputa, fftouta, NSAMS, scale) != EDSP_OK) printf("problem With LogMagnitude"); /** Release FFT RAM **/ FreeFft(); } The above C code listing brings together a number of different DSPLib functions, and provides a good example of DSP processing on the SH. The reader should be familiar with the FFT content of the program, having read section 2.0 of this document. Following the flow of the program, the first task was to generate the signal. Two sine waves, one of Fs/4 and one of Fs/64 are generated and then added together. The floor function used in the generation of the sine waves is contained in the math.h library. This function takes the solution of the equation to its' right, and rounds it to the nearest integer that is not greater than it. Hence the function rounds down. Next, six zeros are deliberately inserted into the start and finish of the data set inorder to create a sudden end to the signal, and hence generate many harmonics, as shown in Diagram 3.6. Following that, a window is generated using one of the functions discussed in this section. The particular window function can simply be changed to test for best results, as was the case in this example. A window is applied to the signal on a sample by sample bases. I.e. sample zero of the signal is multiplied by sample zero of the window, and so on until each element has been processed. This task is performed by the function VectorMult which is contained in DSPLib under the miscellaneous section. For more details regarding this function, the reader should consult section 6.0 of this document, as well as the related Ensigma documentation. Hitachi Micro Systems Europe Ltd 29 DSPLib For SH Application Note (19-031/1.0) Window Functions Once the signal has been windowed, the resultant waveform is applied to the FFT function. This function is described in detail in section two of this document, and is of no concern to us here. The solution from the FFT is converted to log format, and stored in an array for future use. The FFT chosen was `not-in-place', however an `in-place' implementation would also be suitable and save memory usage. The results shown in Diagrams 3.5 through 3.10 were generated by the given C code running on an SH-1 processor. Clearly for a real application, the signal would not be generated on chip, but would instead be read in from an external source such as an ADC. Notes on processing real time signals are included in section 7.0 of this document. Hitachi Micro Systems Europe Ltd 30 DSPLib For SH Application Note (19-031/1.0) Digital Filters 4.0 Filters Introduction This section of DSPLib provides a number of functions which allow the user to perform digital filtering on discrete time samples. Essentially the library offers three different filter networks, even though it contains eight filter functions. The three filters are as follows :Finite Impulse Response (FIR) Infinite Impulse Response (IIR) Least Mean Squares Adaptive Algorithm (LMS) DSPLib allows the user to implement the above functions on both blocks of data, or alternatively on a single sample process which is intended for real time applications. The accuracy of the LMS algorithm can be further increased by the use of DIIR which is a double precision IIR implementation allowing the user to specify 32-bit coefficients. This document is intended to provide the reader with an introduction to the implementation of digital filters on the SH processor. This chapter details some background theory on digital filters, but is not intended to form a comprehensive reference text. A certain level of DSP knowledge is assumed, and if the reader requires further information, a relevant DSP text should be consulted. 4.1 Filter Background Theory In order to ensure a full appreciation of the function descriptions later in this section, some basic filter theory is outlined here. The theory describes the difference between the filter types, as well as outlining the filter structures used in DSPLib. It is recommended that the reader follows this text before moving onto the particular function description of interest. The digital filter lies at the heart of DSP processing. Digital filters have found applications in many different areas, and their everyday use is common place. Digital filters can be found in equipment ranging from compact disc players to mobile telephones. Digital filters are implemented with many different structures, and numerous hardware platforms. Hardware filters are available which allow the coefficients to be loaded in from ROM, and clearly dedicated DSP cores allow efficient implementation. DSPLib allows the user to take advantage of all the features you would expect when using a RISC core, as well as utilising the SH MAC (Multiply and ACcumulate) register, DMA (Direct Memory Access) controller and on chip ADC capabilities which together make real time micrcontroller DSP a reality. Hitachi Micro Systems Europe Ltd 31 DSPLib For SH Application Note (19-031/1.0) Digital Filters 4.1.1 Finite Impulse Response Filters (FIR) DSPLib offers two implementations of FIR filters. functions is described later in the routine descriptions. The basic operation of these The FIR filter has a number of distinct advantages, in particular :i) FIR filters implemented in direct form (functions of past and present samples), they are always stable. ii) The phase response of an FIR filter is exactly linear. A filter that doesn't have linear phase will cause a phase distortion on the signal passed through it. Such distortion is unacceptable in certain applications such as audio, video and various types of data transmission. iii) As indicated in the following text, the implementation of an FIR filter is very simple. The SH series of microcontrollers have an architecture that is particularly suited to the implementation of such filters. The heart of the FIR filter is the MAC operation, which the SH series can perform in three CPU clock cycles. The FIR filter is described in the time domain as follows :N -1 y( n ) = h ( k ) x ( n - k ) k =0 Taking the Z transform of the above expression enables us to represent the FIR filter in the digital domain as follows :N -1 H ( Z ) = h( k ) Z - k k =0 As can be clearly seen, the entire FIR routine consists of multiplying past and present samples with the filter coefficients h(k), and summing the solutions. Hence the implementation of the filter is entirely MAC operations and delays. DSPLib implements the FIR filter by way of a transversal structure. This structure is shown below in diagram 4.1. x(n) Z-1 h(0) Z-1 h(1) Z-1 Z-1 h(2) h(3) Z-1 h(4) Z-1 h(5) h(6) y(n) Diagram 4.1 :- FIR Transversal Structure Hitachi Micro Systems Europe Ltd 32 DSPLib For SH Application Note (19-031/1.0) Digital Filters Diagram 4.1 shows the FIR filter structure as implemented by DSPLib. x(n) represents the input samples, and y(n) the resulting filtered output. The h(n) elements represent the filter coefficients, and it is these that determine the filters frequency response. For the DSPLib implementation, the coefficients should be 16-bit signed values. It is important to remember this when designing the filter as the quantisation will have a large influence on the overall response of the filter. DSPLib doesn't support any filter design methods. It is required that the user determine their own coefficients either by traditional hand calculation methods, or alternatively the use of one of many commercially available software design packages. The later is recommended since different design methods can be tried, and also the analysis of finite word length effects is often possible by the use of simulation before the actual filter is implemented on the SH-1. The implementation of an FIR filter, or indeed any type of digital filter depends upon the intended use. DSPLib provides two FIR routines, both implemented using a transversal structure. The difference in the routines is the execution. One routine takes a finite block of data and performs the filter operation upon it, whilst the second routine filters the data on a sample by sample process. Clearly the latter is more suited to real time applications, in particular situations where the sampling rate introduces harsh time constraints. More detail regarding the two functions is included in the routine description section of this chapter. Because the FIR filter has no feedback, the filter is inherently stable. However, a drawback to this advantage is that to gain a particular performance criteria, an FIR will be of a higher order to that of the IIR. Therefore, in general unless any of the properties described at the start of this section are to be exploited, the IIR filter is used. The IIR filter is described below in 4.1.2. 4.1.2 Infinite Impulse Response Filters (IIR) DSPLib offers four implementations of the IIR, all of which are described later in the routine description section of this chapter. The IIR filter has feedback, and it is this feedback which leads to the filters strengths and weaknesses. The IIR requires fewer coefficients than FIR for the same set of specifications. IIR applications are normally in areas where sharp cut-off is required, along with high speed and in turn a high throughput of data. Of course this type of performance doesn't come free, and the price we pay for the IIR performance is decreased stability. Care must be exercised when designing the filter to ensure that the filter doesn't become unstable and result in an oscillating output. The Z plane transfer function for the IIR is as follows :Y ( Z ) b0 + b1 Z -1 + b2 Z -2 H(z) = = X ( Z ) 1 - a1 Z -1 + a2 Z -2 Hitachi Micro Systems Europe Ltd 33 DSPLib For SH Application Note (19-031/1.0) Digital Filters If we take the inverse Z transform of the above function and then separate out into difference equations, we arrive at the following :w(n) = ( a1w(n - 1) + a2 w(n - 2) + x (n)) y(n) = (b0 w(n) + b1w(n - 1) + b2 w(n - 2)) Where w(n) represents a centre tap in the filter, x(n) is the input data and y(n) is the resulting filtered data. Coefficients for the filter are represented as a and b and are ordered accordingly when being presented to the filter. The coefficient ordering and filter scaling is discussed in the routine description section. All IIR filters in DSPLib are implemented using Direct Form II structure. This structure corresponds directly to the difference equations given above. The structure of this type of IIR filter is given below in Diagram 4.2 X(Z) W(Z) Z b0 Y(Z) -1 a1 b1 Z -1 a2 b2 Diagram 4.2 :- Direct Form II Second Order Biquad The above diagram shows a Direct Form II second order IIR biquad, which is the implementation used in DSPLib. If, as in most cases, a higher order filter is required, the biquads are cascaded. Hence when specifying your IIR filter order, it should be a factor of two. The IIR routine descriptions detail the implementation of the filters. Care must be exercised when specifying shifts applied to the data, but again the routine description section discusses this issue in detail. Hitachi Micro Systems Europe Ltd 34 DSPLib For SH Application Note (19-031/1.0) Digital Filters 4.1.3 Filter Specification Although actual filter design methods are not discussed in this document, it is however important that the reader has an understanding of methods for specifying filters. There are a number of excellent software filter design packages available, all of which require the user to specify the required response. Taking a low pass filter as an example, a filter is typically specified by the following parameters :Pass Band Ripple Stop Band Attenuation Transition Width Filter Order The frequency response of an ideal low pass filter is shown below in Diagram 4.3. The frequency response cuts of immediately when we reach the cut off frequency Fc. Of course in reality, this kind of filter is not realisable, and so certain compromises must be reached wen arriving at an acceptable specification. Gain(dB) 0dB Desired Filter Response Fc Frequency(Hz) Diagram 4.3 :- Ideal Low Pass Filter Because a finite number of filter coefficients are used in practice to specify a filter, and also the degree of quantisation introduced to both the coefficients and the data, a number of errors are introduced. Diagram 4.4 shows a practical filter response, and enables the user to gain an understanding of which parameters need to be specified when designing a filter. Hitachi Micro Systems Europe Ltd 35 DSPLib For SH Application Note (19-031/1.0) Digital Filters The order of the filter has a great bearing on the response of the filter, with small order filters having a poor response. If a sharp cut off is required, i.e. a small transition width, then the filter order must be very high. Clearly this introduces a number of factors which are limiting. Firstly, a high order filter requires a large amount of computation due to its long length, this in turn introduces two problems. The length of the filter means that a long delay is introduced as the data ripples through the filter. Secondly, for each sample, a very large number of computations are required, and so the processing time is increased, hence reducing the real time capability of the system. As well as computational problems, large filters introduce another problem. If a large number of coefficients are used, a large amount of memory is required to store both the coefficients and previous samples. Memory is expensive, and also introduces a limitation in speed due to the large number of memory accesses required. When designing a filter it is necessary for the engineer to reach a number of compromises. The ideal response is played off against speed and cost, with a suitable system eventually being derived. As well as the transition width expanding with a decrease in filter order, several other effects take place. The pass band region develops ripple, and the stop band attenuation becomes less. Clearly the designer should have a feel for the acceptable levels of ripple and attenuation such that an optimised filter may be designed. A practical filter response showing these effects is shown below in Diagram 4.4. Gain(dB) Passband Ripple 0dB Stopband Attenuation Frequency(Hz) Passband Transition Band Stopband Diagram 4.4 :- Actual Low Pass Filter Response As can be seen above, the actual response is very different to that desired. Typically when specifying a filter, the user indicates the acceptable level of ripple in the passband, the transition width and the required stopband attenuation. Hitachi Micro Systems Europe Ltd 36 DSPLib For SH Application Note (19-031/1.0) Digital Filters The software design package will then normally suggest a filter order which would enable the stated criteria to be satisfied. Clearly if the user requires a filter with low pass band ripple, a small transition width and high stop band attenuation, a high order filter is required. 4.2 Routine Descriptions 4.2.1 Fir The function FIR implements a Finite Impulse Response filter. The FIR filter has a transversal structure which is described in section 4.1.1 of this document. The related Ensigma documentation takes the reader through the definition of the function, but for the readers reference, the function is defined as follows :int Fir(output,input,no_samples,coeff,no_coeffs,res_shift,workspace) Where, short short long short long int short output[] input[] no_samples coeff[] no_coeffs res_shift *workspace output samples y input samples x number of samples N to be filtered array containing filter coefficients h length of filter, thus number of coefficients K right shift applied to each output pointer to filter memory Using Fir the use of the filter routines is relatively straight forward. It is important to initialise the filter memory (on chip), before calling the routine. The memory is initialised by calling the routine InitFir. This routine sets aside a section of on chip RAM for the storage of coefficients and also past samples which are required for FIR computation. Coefficient Storage Coefficients should be calculated as 16 bit fixed point. The calculated coefficients should then be stored in an array which can be seen by the compiler. Assuming that the array is called coeff, a typical file would follow the following format :short coeff[10] = {0,0,12,15,16,16,15,12,0,0}; Clearly when a large number of coefficients are used, it is good practise to store the coefficients in their own file, and include the array file name in the top of your main program. Hitachi Micro Systems Europe Ltd 37 DSPLib For SH Application Note (19-031/1.0) Digital Filters Example The following example outlines the use of the Fir routine. The example is written in C and is intended to provide the reader with a typical code layout. #include #include #include #include "testhead.h" /*** Include File Containing data to be filtered ***/ #define NSAM 128 /* number of samples */ #define no_coeff 32 #define MAXSUM 67108864 void main(void) { /*** Declare Local Variables ***/ short short int long coeff[no_coeff], filout[NSAM]; *work; n, res_shift; sum; /*** Array Containing Coefficients ***/ coeff[32]={0, 0, 9, -1, 62, -253, 99, -168, 1283, -1017, 75, -3560, 5196, -237, 7506,-26122, 17126, 17126, -26122, 7506, -237, 5196,-3560, 75, -1017, 1283, -168, 99,-253, 62, -1, 9}; res_shift = 16; /*** Shift Applied To Output Data ***/ /*** Check Scaling Of Coefficients ***/ for(n = 0; n < no_coeff; n++) { sum += fabs(coeff[n]); } if(sum >= MAXSUM) printf("Bad Coefficients"); /*** Initiate Filter Workspace ***/ if(InitFir(&work, no_coeff) != EDSP_OK) printf("Problem With InitFir"); /*** Fir Filter The Signal ***/ if(Fir( filout, signal, NSAM, coeff, no_coeff, res_shift, work) == EDSP_BAD_ARG) printf("Filter Problem"); } Hitachi Micro Systems Europe Ltd 38 DSPLib For SH Application Note (19-031/1.0) Digital Filters The above C code example shows a typical filter routine for processing blocks of samples. In this case, the routine takes a block of 128 samples, and performs a filter operation on them accordingly. The filter is of length 32, and the pre-calculated samples are stored in the array coeff[] at the start of the program. Before the filter is initialised, the coefficients are checked for scaling. It is suggested as a general rule of thumb that the absolute sum of the filter coefficients is less than 226. Checking this should help to avoid saturation, as long as the result shift is set accordingly. Obviously, in an environment where time is an issue, it is desirable to perform such checks during development, and remove the code for the final version when the routine is known to be working. Next, the filter workspace has to be initialised. The routine InitFir() sets aside an area of on chip RAM for the filter routine to use as a scratch pad. The actual area is undefined, and will vary on each execution. It is therefore recommended that the user does not attempt to access this area at any time. When calling this routine, a pointer must be specified to the area, in our case the pointer `work' is used. This pointer must be specified when calling the actual filter routine. Finally in our C code example, the FIR filter routine is executed. The function Fir is called, with the following arguments:filout() = Samples from filter signal() = Input samples for filter NSAM = Number of input/output samples coeff() = Filter coefficients no_coeff = Number of filter coefficients (length of filter) res_shift = Shift applied to output work = Pointer to filter workspace in RAM These arguments tie in with the Fir definition at the start of this section. If the arguments are outside of a particular range, the filter will not operate, and will instead return a bad argument flag. This flag, called EDSP_BAD_ARG is tested in the above example, and its' generation in turn is used to generate an error message to the user. The flag EDSP_BAD_ARG is generated if the following occurs :no_samples no_coeffs res_shift res_shift < <= < > 1 2 0 27 4.2.2 Fir1 This routine follows on directly from the above description of Fir, and it is therefore recommended that section 4.2.1 is read before moving on to this section. Fir1 is identical to Fir except for the fact that it processes one sample at a time. Hitachi Micro Systems Europe Ltd 39 DSPLib For SH Application Note (19-031/1.0) Digital Filters This routine is optimised for signal sample computation, and should be used instead of the above routine when `no_samples' is set to one. The routines definition is outlined below :Fir1(output,input,coeff,no_coeffs,res_shift,workspace) Where, short *output Pointer to single output sample y(n) short input single input sample x(n) short coeff[] array containing filter coefficients h long no_coeffs number of coefficients K (length of filter) int res_shift right shift applied to output short *workspace pointer to filter memory Fir1 will give bit identical results to Fir, and uses the filter workspace to hold previous samples that are necessary for computation. The routine finds applications in areas where real time processing is required, and as a result the filter operation is applied on a sample by sample bases. Since this function is essentially identical to the Fir routine, coefficients are stated in an identical manner to that shown in 4.2.1. Also, for the purpose of a software example, the code shown in 4.2.1 is relevant. For an identical filter as that shown, the above code may be re-written as follows :- /*** Fir Filter The Signal ***/ if(InitFir(&work, no_coeff) != EDSP_OK) printf("Problem With InitFir"); for(n=0; n < NSAM; n++) { if(Fir1( filout, signal[n], coeff, no_coeff, res_shift, work)==EDSP_BAD_ARG) printf("Problem With Filter") } The above example is a `snippet' of code which could replace the filter section of the example code shown in 4.2.1. The code uses a for loop to filter a number of samples one at a time. Clearly, in a real time environment, the loop would be omitted, and the filter operation would be initiated by the end of the ADC conversion, as an example. It should be noted that before using Fir1, the filter workspace should be initialised by calling InitFir. This is the same initialisation routine as that used for Fir. In order to minimise quantisation noise and saturation, the same scaling rules apply as that discussed earlier, being to keep the absolute sum of the coefficients to a level lower than 226. The resultant shift should also be set to 26 if overflow is considered to be a problem, however in most cases this simply isn't an issue. Hitachi Micro Systems Europe Ltd 40 DSPLib For SH Application Note (19-031/1.0) Digital Filters 4.2.3 Iir Now we move onto IIR filters. The reader should be familiar with the basic concepts of IIR filters before reading the following routine descriptions. Section 4.1.2 of this document takes the reader through the basic ideas behind IIR filters, as well as explaining the filter structure for implementing IIR, and in particular the Direct Form II, as used in the DSPLib routines. As described above, Iir is implemented using Direct Form II biquads. A biquad is a single second order filter stage which may be cascaded in order to form higher order filters. Now because second order filters are cascaded to implement the higher order designs, it is important for the reader to remember that the stated filter order must be a factor of two. The Iir function is defined as follows :int Iir(output,input,no_samples,coeff,no_sections, workspace ) where, short short long short long short output[] input[] no_samples coeff[] no_sections *workspace output samples input samples number of samples to be filtered filter coefficients number of second order sections pointer to workspace Coefficient Storage The definition is similar to that shown in the Fir descriptions earlier. However, due to the differences in structure, the Iir coefficients are specified in a very different manner. The filter coefficients are stored in a single array. By looking at the transfer function of a single second order biquad, we can see that there are two groups of coefficients for each stage, being an and bn. H(z) = Y ( Z ) b0 + b1 Z -1 + b2 Z -2 = X ( Z ) 1 - a1 Z -1 + a2 Z -2 Hence, as can be seen from the transfer function, for each second order biquad, the user must specify five coefficients. However, another factor is thrown into the equation which further complicates things. The Iir functions in DSPLib can have a shift applied to them at each stage, and the shift applied is stated in the same file as the coefficients. The Iir difference equations for Iir should make things clearer. Hitachi Micro Systems Europe Ltd 41 DSPLib For SH Application Note (19-031/1.0) Digital Filters The equations are as follows :w(n) = ( a1 w(n - 1) + a2 w(n - 2) + x (n)).2 -15 y(n) = (b0 w(n) + b1 w(n - 1) + b2 w(n - 2)).2 - a0 k If you are not familiar with these equations for the second order biquads, you should read section 4.1.2 before preceding here. As can be seen above, each of the two `a' coefficients have a 15 bit right shift applied, which means that the `a' coefficients MUST be in Q15 format in order to ensure accurate results. The `b' coefficients are subject to a user specified shift, which as described is contained in the same array as the coefficients. I.e. in the above equation if the'b' coefficients are in Q15 format, a0k should be set to 15. Hopefully, it therefore becomes clear that the a0k term describes a shift rather than a coefficient. The k part of the term describes which cascaded biquad the term belongs to. Hence, if two biquads were used, the coefficients would be specified as follows :{a00,a10,a20,b00,b10,b20,a01,a11,a21,b01,b11,b21} It should now be clear that in the above array, the terms a00 and a01 are not coefficients, they represent a right shift which is applied to the `b' coefficients. Example There now follows a basic example of how the Iir function is typically used. The coefficients have been designed on a separate package and have been quantised to 16 bit resolution. The `a' coefficients are in Q15 format, and the `b' coefficients are in Q15 format, with the shift set to 15 as well, thus returning the solution in the correct form. The C code example is as follows :#include #include #include #include "testhead.h" /*** Include File Containing data to be filtered *** #define NSAM 128 #define no_sections 2 #define no_coeff 12 /* number of samples */ void main(void) { Hitachi Micro Systems Europe Ltd 42 DSPLib For SH Application Note (19-031/1.0) Digital Filters /*** Declare Local Variables ***/ short short coeff[no_coeff], output[NSAM]; *work; /*** Array Containing Coefficients ***/ coeff[no_coeff]={15,-29186,-11663,9008,18009,9008,15,-27423, 28914,14004,27934,14004}; /*** Initiate Filter Workspace ***/ if(InitIir(&work, no_sections) != EDSP_OK) printf("Problem With InitFir"); /*** IIR Filter The Signal ***/ if(Iir( output, signal, NSAM, coeff, no_sections, work) == EDSP_BAD_ARG) printf("Filter Problem"); } The above C code takes the reader through a very simple example of how to integrate the Iir function within a program. The coefficients are included in the program for clarity, but could just as easy be in a different file that is pointed to by an include statement. The two a0 values of 15 can be easily seen in amongst the coefficients, which constitute a shift right. The filter workspace is initiated by calling InitIir. In a time critical system, this operation should be performed during the initial set-up, and not included in the main loop. The filter only has to be initialised once for a particular filter size, and is completely independent of the number of function calls made to Iir. The final part of the program details the actual calling of the function. The data is taken from the array `signal', which is omitted from the above code. In reality, `signal' would be a series of time domain samples stored in its' own file pointed to by an include statement. The function is tested for bad arguments, and returns an error message if there is a problem. 4.2.4 Iir1 This function follows directly on from the above description of Iir. Iir1 offers an IIR filter which performs a filter operation on a single sample, thus making it particularly suitable for real time operations. The function is defined as follows :int Iir1( output, input, coeff, no_sections, workspace) Where, short *output short input short coeff[] long no_sections short *workspace pointer to filtered output sample input sample filter coefficients number of second order filter sections pointer to start of filter workspace Hitachi Micro Systems Europe Ltd 43 DSPLib For SH Application Note (19-031/1.0) Digital Filters The function is used in exactly the same manner as Iir. Filter coefficients are specified in the same format as that described in 4.2.3, and the method of applying a shift to the output is also identical. When the user wishes to use Iir with no_samples set to one, this function should be used instead. Iir1 is optimised for single sample operation, and will offer some computational savings over Iir in this particular situation. The software example given in the previous example is directly relevant here. For the same routine to be implemented using Iir1, the following change would be made. /*** IIR Filter The Signal ***/ for( n=0; n <= NSAM; n++) { if(Iir1(output,signal, coeff, no_sections, work) == EDSP_BAD_ARG) printf("Filter Problem"); } I.e the filter section of the routine has been replaced by a for loop which repeatedly runs the Iir1 routine. Clearly in a real application it is unlikely that the function would be used in this manner, but it serves as an example of the function operation. This routine will return bit identical solutions to the example using Iir. Before calling Iir1, it is important to initialise the filter workspace. Although the filter performs computation on one sample at a time, the filter must store previous input and output samples in order to perform the filter operation. Therefore, as with all the filter routines in DSPLib, the user must initialise the filter workspace. When using Iir or Iir1 the function InitIir must first be called. When calling this routine, the user must provide a pointer to the start of the workspace, and also indicate the size of the filter. The example in section 4.2.3 shows the use of InitIir, and is directly relevant to the use when calling Iir1. 4.2.5 DIir This function implements an IIR filter with a direct form II structure. The filter is made up of a series of cascaded second order biquads. If the reader is not familiar with the structure of an IIR filter, they must consult section 4.1.2 before continuing here. This routine offers a filter which is essentially the same as that offered by Iir, except that it allows double precision computation. This added precision is achieved by the inclusion of 32 bit coefficients. Input and output data is still 16 bit, but the added precision of the coefficients will increase the accuracy of the filter in situations where accuracy is an issue and could possibly affect filter stability. Hitachi Micro Systems Europe Ltd 44 DSPLib For SH Application Note (19-031/1.0) Digital Filters The related Ensigma documentation explains how the double precision is realised, and it is the purpose of this document to demonstrate how to use the routine. DIir is defined as follows :int Diir( output, input, no_samples, coeff, no_sections, workspace) Where, short short long long long long output[] input[] no_samples coeff[] no_sections *workspace filter output samples filter input samples number of samples to be filtered 32 bit filter coefficients number of second order sections (biquads) pointer to filter workspace Using DIir The use of DIir is essentially the same as using Iir, only the coefficients are 32 bit. Before calling DIir it is important to initialise the filter workspace. This is achieved by calling the function InitDIir. When calling this function, the user must specify the number of second order sections in the filter, and also provide a pointer to the area in RAM which this function sets aside for the filters use. The reader can see this function in the C code example below. It is important that the user doesn't try to access the filter workspace at any point for two reasons. First of all, the area is undefined, and so the user has no idea of the location or size of the area before or during execution. Secondly, the filter will access the RAM to store past samples. The user does not have access to the method of storage, and so intervention could result in the failure of the filter. Coefficient Storage Since this function implements an IIR filter, the method of storing coefficients is similar to that shown in section 4.2.3. However, due to the double precision of the coefficients, the method of applying shifts to the results is different. In order to understand the filters operation, we must first look at the double precision filter difference equations. w(n) = [a1k dk (n - 1) + a2 k dk (n - 2 ) + 2 31 x (n)].2 -31 y(n) = [b0 k dk (n) + b1k dk (n - 1) + b2 k dk (n - 2)].2 - a0 k The above equations are essentially the same as those given for the Iir function. The coefficients are stored in a single array in the same format, which is outlined below. {a00,a10,a20,b00,b10,b20,a01,a11,a21,b01,b11,b21............} Hitachi Micro Systems Europe Ltd 45 DSPLib For SH Application Note (19-031/1.0) Digital Filters The area where this function differs is the specification of shifts applied to outputs of biquads. As the difference equations show, the `a' coefficients must be specified in Q31 format. The output of each biquad can be shifted by the use of a0k and it is important that the reader fully understands how the shifts work. When the 16 bit input x(n) is applied to the filter, it is firstly multiplied by 216 to put it into Q31 format which is suitable for the filter arithmetic. This shift must be compensated by the user at the output in order to gain accurate results. The following two block diagrams should help to clarify matters. a1,a2 Coeffs x(n) 2 16 b0,b1,b2 Coeffs 2 31 2 15 2 -31 2 -31 y(n) DIir Filter Diagram 4.5 :- Shifts Applied In A Single Second Order Biquad Referring to the above diagram, the method of applying shifts to a single second order biquad can now be explained. The input samples, x(n) are immediately multiplied by 216 by the function, the user has no control over this. Next, all the `a' coefficients must be specified in Q31 format, as they are multiplied by 2-31 by the filter function, and again the user has no control over this. Finally we introduce the `b' coefficients, which are multiplied by a user specified shift. This shift, a0k must not only compensate for the format of the `b' coefficients, but also compensate for the original shift applied to the input data. In the diagram above (4.5), the coefficients are stored in Q15 format. The final result shift applied is therefore 2-31. The reason for this is as follows : The coefficients are in Q15 format, which means that they have been multiplied by 215 to put them in correct form. Also the shift has to compensate for the original 216 shift applied at the start of the process. Therefore we arrive at the required shift of 2-31 which will compensate for both shifts and return the solution in the correct format, whilst allowing the processor to perform double precision computation. Hitachi Micro Systems Europe Ltd 46 DSPLib For SH Application Note (19-031/1.0) Digital Filters If the filter order is higher than two, i.e. more than one biquad is used, the compensation doesn't have to be applied until the last stage, thus allowing the intermediate stages to be expressed in Q31 format. x(n) 216 a1,a2 Coeffs b0,b1,b2 Coeffs a1,a2 Coeffs b0,b1,b2 Coeffs 231 231 231 231 2-31 2-31 2-31 2-47 DIir Filter DIir Filter Diagram 4.6 :- Shifts Applied In A Fourth Order Filter The function allows the final output stage shift to be expressed right up to and including 48. This means that on a fourth order system and higher, the filter can be a true dual precision. In the example above, it is clear to see that the final shift applied is 47, which compensates for the Q31 format of the `b' coefficients, and also compensates for the original 216 shift applied at the start. It is very important to remember that such a large shift is only possible on fourth order filters and higher, and as a result, it is not possible with this function to implement a true second order double precision filter. Example The best way to reiterate the above theory is by way of an example. This example is written in C as are all the examples in this text. The example shows the array containing the filter coefficients, as well as the function calls to implement the filter. The filter coefficients were calculated on a design package, and implement a fourth order low pass filter. The values obtained from the package were as follows :- Design Package Solution a10 = 0.6149292 a20 = 0.4001465 b00 = 0.1794281 b10 = -0.3097076 b20 = 0.1794281 Q31 Format 1320550402 859308065 385318910 -665092006 385318910 Hitachi Micro Systems Europe Ltd 47 DSPLib For SH Application Note (19-031/1.0) a11 = -0.2411804 a21 = 0.8682861 b01 = 0.7138977 b11 = -0.7098694 b21 = 0.7138977 Digital Filters -517930965 1864630202 1533083637 -1524432929 1533083637 Hence the result of the filter design operation is shown on the left, and the Q31 format value is shown on the right. These values are then used in the double precision filter example outlined below. The reader should take particular notice of the coefficient array, and especially the a0 values which describe the shift applied during the filter operation. #include #include #include #include "testhead.h" /*** Include File Containing data to be filtered ***/ #define NSAM 128 #define no_sections 2 #define no_coeff 12 /* number of samples */ void main(void) { /*** Declare Local Variables ***/ short short coeff[no_coeff], output[NSAM]; *work; /*** Array Containing Coefficients ***/ coeff[no_coeff]={31, 1320550402, 859308065, 385318910, -665092006, 385318910, 47, -517930965, 1864630202, 1533083637, -1524432929, 1533083637}; /*** Initiate Filter Workspace ***/ if(InitDIir(&work, no_sections) != EDSP_OK) printf("Problem With InitDIir"); /*** DIIR Filter The Signal ***/ if(DIir( output, signal, NSAM, coeff, no_sections, work) == EDSP_BAD_ARG) printf("Filter Problem"); } The above example filters a set of data samples 128 long using the double precision IIR filter. Before calling the filter routine, the filter workspace is initialised by calling InitDIir. This function requires two arguments, one being a pointer to the workspace, and the second being the number of second order sections the filter contains, thus indicating to the function the size of RAM required. Hitachi Micro Systems Europe Ltd 48 DSPLib For SH Application Note (19-031/1.0) Digital Filters The filter coefficients are stored in an array in the main program, and this should clearly demonstrate the shifts which are required. All the coefficients are in Q31 format (the `a' coefficients must be in Q31 format), and the two shifts applied are 31, and 47. The shift of 47 compensates for the 16 place shift applied at the input as described earlier. The actual filter routine is called at the end of the program, and will filter the block of 128 samples, and then finish. The actual calling of the routine and arguments used is identical to Iir described in 4.2.3, except that the solution although still 16 bit is of a higher precision. 4.2.6 DIir1 This function offers exactly the same filter operation as that described in 4.2.5. The function implements a double precision IIR filter, but only processes a single sample at a time. This makes DIir1 particularly suitable for real time operation where the user wishes to perform computation on a sample by sample bases. The function is defined as follows :int DIir1( output, input, coeff, no_sections, workspace) Where short *output short input long coeff[] long no_sections long *workspace pointer to output sample input sample array containing 32 bit coefficients number of second order sections pointer to start of filter workspace The filter is used in exactly the same manner as DIir, and the reader should consult section 4.2.5 for full details of the operation. The method for specifying the filter coefficients is also exactly the same, i.e. they are all 32 bit and require the same methods of applying shifts as that described earlier. If the user wishes to use DIir with `no_samples' set to one, this function should be used instead. DIir1 has been optimised for single sample computation and will provide an increase in speed compared to DIir. To modify the C code example given in the previous section such that DIir1 can be used, the following change would be made. /*** DIIR Filter The Signal ***/ for( n=0; n <= NSAM; n++) { if(DIir1(output,signal, coeff, no_sections, work) == EDSP_BAD_ARG) printf("Filter Problem"); } Hitachi Micro Systems Europe Ltd 49 DSPLib For SH Application Note (19-031/1.0) Digital Filters As should be clear from the example code above DIir1 is the same as DIir only the user does not have to specify the number of samples to be processed since it only processes one at a time. Before calling DIir1, it is essential that the filter workspace is initialised. This is achieved in exactly the same manner as that for DIir, that is by calling InitDIir and specifying the number of second order sections the filter contains, and also providing a pointer to the assigned memory location. The use of this initialisation function can be seen in the software example in section 4.2.5. The code above is intended only to indicate the operation of the function. Imbedding the function in a For loop will actually increase the run time compared to the use of DIir, and so is not recommended. However, for evaluation purposes, the user will find that both DIir and DIir1 will return bit identical results. Even though the filter only processes one sample at a time, it is necessary to store past samples to feedback into the filter. These samples are stored in the filter workspace, and the very oldest samples are discarded as the number of single sample computations increase. As a final note in this section it should be pointed out to the user that using the double precision routines will most probably not be necessary, and will impose a time constraint. In most applications, the accuracy of the IIR functions will be suitable, and due to the simpler nature of the implementation, the IIR functions run an order of magnitude faster on the SH1. The use of the double precision filters should be contained to areas where the user feels that single precision will adversely affect the filter operation and lead to problems such as instability. As an example of the speed difference, on a 10MIPS SH-1, Iir1 takes 18.1uS/sample where as DIir1 takes 48.0uS/sample. 4.2.7 Lms This is the final filter structure offered by DSPLib, and comes in two forms LMS and LMS1. This section describes the background behind the LMS filter, and also includes an example of the routine performing a filter operation. LMS stands for Least Mean Squares, which in turn describes the algorithm which is used in this adaptive filter. The LMS adaptive filter is based around an FIR filter, and it is assumed that the reader is familiar with its' basic operation. If this is not the case, the reader should consult section 4.1.1 of this document which explains the basic concepts behind the FIR, as well as showing the transversal structure which is incorporated into this adaptive algorithm. The function Lms is defined as follows :int Lms( output, input, ref_output, no_samples, coeff, no_coeffs, res_shift, conv_fact, workspace) Hitachi Micro Systems Europe Ltd 50 DSPLib For SH Application Note (19-031/1.0) Digital Filters Where, short short short short short long int short short output input ref_output no_samples coeff[] no_coeffs res_shift conv_fact workspace output samples y(n) input samples x(n) desired output values d(n) number of samples to be filtered filter coefficients h(n) number of coefficients right shift applied to output convergence factor 2 pointer to filter workspace The LMS routine provides a filter that will adapt to a reference signal given to it. This means that no filter design is necessary, the filter will adjust its own coefficients using the LMS algorithm such that the reference signal provided to it is included from the main filter signal. As an example, if a filter is supplied with a reference sine wave of frequency Fs/4, and an input signal containing a number of sine waves such as Fs/2,Fs/4,Fs/8 and Fs/16, the output of the filter would be Fs/4. Hence the filter has adapted its coefficients such that the Fs/4 component of the signal is included. The LMS filter is mathematically described as follows :FIR Filter : K -1 y(n) = hn ( k ) x (n - k ) k=0 LMS Adaptive Algorithm :hn +1 (k ) = hn ( k ) + 2 e(n) x (n - k ) Where h(k) are the filter coefficients which are adapted. determines the convergence rate, i.e. the rate at which the coefficients converge and provide the required output. Clearly the faster the convergence rate, the less accurate the output. It is up to the user to arbitrarily test different values until a suitable response is achieved. The same scaling conventions apply to this filter as with the Fir routine. However, it is not guaranteed that the adaptive algorithm will keep to the convention, and saturation may occur. If this is the case, it is most likely that the filter is having trouble adapting, and the filter order should be increased to rectify this. Diagram 4.7 shows a block diagram showing the implementation of the filter. Hitachi Micro Systems Europe Ltd 51 DSPLib For SH Application Note (19-031/1.0) Digital Filters FIR Filter x(n) (Input) Z-1 Z-1 Z-1 Z-1 Z-1 Z-1 N+1 Weights Adaptive Algorithm e(n)=d(n)-y(n) y(n) (Output) _ d(n) Desired Output (Reference) Diagram 4.7 :- LMS Filter Block Diagram With reference to the above diagram, the area within the grey block is the standard FIR filter implemented in a transversal structure. The output of the filter is subtracted from the desired signal output (reference) which in turn generates an error signal e(n). This error signal is used by the adaptive algorithm to update the filter weights (coefficients) in order to reduce the error signal, and thus eventually arrive at the desired filter response. Initially, when the filter hasn't started to adapt, the filter coefficients are set to an arbitrary value. The value chosen should not make a difference to the final value reached, however, it may have some affect on the rate at which it meets the ideal weights. Wk Actual Convergence Desired Convergence Optimal Weights W opt k Diagram 4.8 :- Convergence Of Filter Weights The above diagram illustrates the variations in the filter weights as the feedback adjusts them accordingly. Although the final weights are very close to the desired values, they never actually settle there, but will instead fluctuate around the optimum value. Hitachi Micro Systems Europe Ltd 52 DSPLib For SH Application Note (19-031/1.0) Digital Filters Example Now that the basic theory has been covered, an example should help to reiterate any problem areas. The example takes an input signal which consists of five sine waves added together, and adaptivly filters all but one of the components. The reference input is the frequency component that is left at the end. The C code is given first, followed by an in depth explanation of the process, including the analysis of various filter lengths. #include #include #include #define #define #define #define #define #define #define NSAM 128 TWOPI 6.283185307 no_stage 1 MAXSIZE NSAM NSAMS (NSAM/2) TWOMU 32767 ncoeff 8 /***************** Exercise *******************/ /* number of samples */ To Implement An LMS Adaptive Filter void main(void) { short coeff[ncoeff], ref[NSAM], noise[NSAM], noisea[NSAM], fftout1[NSAM], sine1[NSAM], sine2[NSAM], sine3[NSAM], sine4[NSAM], sine5[NSAM], signal[NSAM], output1[NSAMS], filout[NSAM]; int long short n, k, loop, res_shift; sum, len; *work; len = NSAM; res_shift = 16; loop = 100; /** Coefficients For Adaption **/ coeff[ncoeff]={1,1,1,1,1,1,1,1}; /*** Generate Sine Waves ***/ printf("Generating Sine Wave"); k = len / 8; for(n = 0; n < len; n++) { sine1[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5); } k = len / 16; for(n = 0; n < len; n++) { sine2[n] = floor(6553 * sin(TWOPI * k * n / len) + 0.5); } k = len / 4; for(n = 0; n < len; n++) Hitachi Micro Systems Europe Ltd 53 DSPLib For SH Application Note (19-031/1.0) Digital Filters { sine3[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5); } k = len / 32; for(n = 0; n < len; n++) { sine4[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5); } k = len / 64; for(n = 0; n < len; n++) { sine5[n] = floor(3162 * sin(TWOPI * k * n / len) + 0.5); } printf("Done"); /*** Create Signal To Be Filtered By Adding Sine waves ***/ for(n = 0; n < len; n++) { signal[n] = (sine1[n] + sine2[n] + sine3[n] + sine4[n] + sine5[n]); ref[n] = (sine3[n] ); } if(InitLms(&work, ncoeff) != EDSP_OK) printf("Problem With InitFir"); /*** Perform Adaptive Filter Operation ***/ /*** Adapt coeffs 100 times ***/ for(n=0; n < loop; n++) { if(Lms(filout, signal, sine3, NSAM, coeff1, ncoeff, res_shift, TWOMU, work) == EDSP_BAD_ARG) printf("Problem With LMS Operation"); } free(work); } The above code performs two tasks. First of all the code generates the signals which are to be filtered. Clearly this is for test purposes only, and in a real system, the data would come from a device such as an ADC. The signal to be filtered consists of five sine waves all added together. The corresponding frequencies of these signals are Fs/4, Fs/8, Fs/16, Fs/32 & Fs/64. The sum of these waves forms the data set to be filtered, consisting of 128 samples. Next, the reference signal is generated, which is chosen to be Fs/4. This means that the coefficients should adapt and remove all bar the Fs/4 component from the signal. Hitachi Micro Systems Europe Ltd 54 DSPLib For SH Application Note (19-031/1.0) Digital Filters Next step in the code is to initialise the filter workspace by calling the function InitLms. This function is used in exactly the same manner as the previously described initialisation routines. So finally in the code, we arrive at the actual filter operation. The function is called 100 times in order to demonstrate how well the coefficients adapt. The calling of the function is fairly self explanatory, and should not cause any problems. The function Lms modifies the filter coefficients stored in coeff[], and as a result they should be stored in on chip RAM, thus allowing fast access. If the coefficients are stored off chip, it is a good idea to copy them into on chip RAM. The spectrums of the signals involved in the filter operation are displayed below. Diagram 4.9 shows the five sine waves added together which forms the input to the adaptive filter. This signal was then adaptivly filtered as described above with eight coefficients. The value of 2 was chosen to be 32767, and wasn't changed during the whole of the testing period. The user may wish to start off with a similar value in order to establish operation, then change accordingly in order to achieve faster convergence. The result of filtering with eight coefficients or weights is displayed in figure 4.10. As can be seen, the unwanted elements of the signal have been attenuated, but not removed. The reason for this is that the small filter order physically cannot apply the attenuation required to remove the unwanted elements. FFT Of Input Signal 80 70 60 50 40 30 20 10 0 -10 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 Diagram 4.9 :- Spectrum Of Input Signal To Adaptive Filter Hitachi Micro Systems Europe Ltd 55 DSPLib For SH Application Note (19-031/1.0) Digital Filters FFT Of Signal Filtered By 8 Coefficients 70 60 50 40 30 20 10 0 -10 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 Diagram 4.10 :- Spectrum Of Filtered Signal Using 8 Coefficients As can be seen in Diagram 4.10, the filter has adapted to the reference signal, but has failed to filter the unwanted components, even after one hundred operations. Therefore, for the purposes of demonstration, the number of coefficients, i.e. the FIR filter length is increased to thirty two. This means that in the software example above, ncoeff = 32, and the array coeff consists of thirty two elements each containing one. With the increased coefficients, the program was ran, and the results shown in Diagram 4.11 below. It is clear to see that the increase in coefficients increased the selectivity of the filter, and was more capable to reject the unwanted components from the signal. In a real application, it is good practice to try different filter lengths on a trial and error basis, thus enabling the user to arrive at the most efficient implementation. Clearly large filters are bad news since they require more memory, and as a result run slower. FFT Of Result Of Filtering With Ref = Fs/4 70 60 50 40 30 20 10 0 -10 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 Diagram 4.11 :- Result Of Adaptive Filtering With 32 Tap Filter Hitachi Micro Systems Europe Ltd 56 DSPLib For SH Application Note (19-031/1.0) Digital Filters As can be clearly seen above, with a filter length of 32, the unwanted signal components have been completely removed. It is therefore unnecessary to use a filter of a higher order for this particular application. 4.2.8 Lms1 This function is the final one to be discussed from the filter section of DSPLib. The function is identical in every way to the function Lms, and the in depth discussion in the previous section is directly relevant here. The reader should consult both section 4.1.1 on FIR theory, and also section 4.2.7 detailing adaptive filters. Lms1 provides an algorithm which implements the Least Mean Squares adaptive algorithm on a single sample. Hence the function finds particular application in real time processing. The function Lms1 is defined as follows :int Lms1( output, input, ref_output, coeff, no_coeffs, res_shift, conv_fact, workspace) Where, short short short short long int short short *output input ref_output coeff[] no_coeffs res_shift conv_fact workspace output sample y(n) input sample x(n) desired output value d(n) filter coefficients h(n) number of coefficients right shift applied to output convergence factor 2 pointer to filter workspace By examining the filter definition it should be clear to the reader that the only difference between the two functions is that no_samples has been omitted. In applications where Lms is used with no_samples set to one, Lms1 should be used instead. Lms1 has been optimised for single sample operation, and will provide time savings over the use of Lms. If more than one sample at a time requires sampling, Lms should be used since it is quicker than using Lms1 in a for loop. Before calling Lms1, the filter workspace should be initialised. This is achieved by calling the function InitLms. This is the same routine as that used to initialise the Lms filter workspace. The workspace is used by the filter to store past samples which are required for computation. It is important that the user makes no attempt to access this RAM at any time during filter execution in order to ensure filter accuracy and speed. Because of the adaptive algorithm, Lms is a degree slower than Fir to operate. It is therefore a possibility that the Lms routine can be used to generate filter coefficients during development, and once finished can be inserted into a fixed FIR in order to speed up implementation. Clearly if constant adaptive filtering is required this method is not suitable. Hitachi Micro Systems Europe Ltd 57 DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation 5.0 Convolution and Correlation Introduction DSPLib offers five functions in this section, two related to correlation and three related to convolution. This text offers the reader a brief introduction to the theory behind convolution and correlation, before describing each function individually. The background theory is intended only as an introduction, and a relevant DSP text should be consulted for any additional information required. The five functions offered by DSPLib are as follows :ConvComplete ConvCyclic ConvPartial Correlate CorrCyclic The differences between each function are described in the routine descriptions section of this report (5.2). Applications of these functions include areas such as image processing where two images are compared for similarity, and also range and distance finding in areas such as sonar and radar. 5.1 Background Theory 5.1.1 Convolution The simplest way to describe convolution is to state that convolution in the frequency domain is equivalent to multiplication in the time domain. The equation for convolution is given below :W -1 y( m) = w(i ) x ( m - i ).2 - res _ shift i=0 It should be noticed that this identical to the expression used to describe an FIR filter. The operation of such a filter is that the input samples are convolved with the filter coefficients and result in a filtered output. The convcomplete function will return bit identical results to fir, when used in this manner. When not used for filtering, the convolution function convolves the two input data sets and returns a solution in y(m). The solution returned describes how an input signal interacts with the system in order to generate a particular output. 59 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation Relating correlation and convolution together is fairly simple. The convolution of two data sets is identical to the cross-correlation of one data set with the time reversal of the second data set relative to that used in convolution. Correlation is described below, and is important in the understanding of convolution. 5.1.2 Correlation Correlation essentially highlights similarities in data sets, and as a result finds applications in areas such as robotic vision and range finders. The equation for correlation is given below :W -1 y( m) = w(i ) x (i + m).2 - res _ shift i=0 The two data sets w(i) and x(i) are correlated, and the output y(m) will increase in amplitude when the two inputs exhibit similarities. Hence, in range finder applications, the transmitted pulse is correlated with the received pulse. A peak will occur when the transmitted data is received, and thus enables the distance to be computed. Alternatively, the output can decrease in amplitude, indicating a negative correlation, i.e. similar signals that are out of phase. Correlation can be used to examine the properties of a given data set. If a set of data is correlated against its self, the process is referred to as cross correlation. This method can be used to remove noise from a periodic signal, and is shown by way of an example later in this text. The library offers a number of different variations of the above two functions. Essentially, the same task is performed, and its only the range over which the computation is computed that differs. The actual differences are described below in the routine descriptions section (5.2), and also includes a number of examples written in C. The theory behind convolution and correlation is beyond the scope of this text. The descriptions and examples that follow should provide the reader with an outline of the subject, and should help to allow the user to access the functions operation without having to become involved with complex general mathematical descriptions often found in text books. It should also help to highlight typical application areas, thus presenting ideas for its use. Although the example programs are shown written in C, it is recommended that the optimised source code be used for the actual DSPLib function calls. The use of C compiled code for DSP functions is not suitable, and will introduce a large time overhead. All the optimised DSPLib functions are included in the file ensigma.lib. This file should be included in the link command file, when linking the programs together. 60 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation 5.2 Routine Descriptions 5.2.1 ConvComplete This function offers complete convolution between two data sets. The function is essentially the same as Fir, and will return identical results given the same data. The function is defined as follows :int ConvComplete(output, iw, ix, iw_size, ix_size, res_shift) where, short short short long long int output[] iw[] ix[] iw_size ix_size res_shift output data (size W+X-1) input w input x size of w (size W) size of x (size X) right shift applied to output This function completely convolves the input data sets, and inserts zeros at the start and end of the two arrays to enable this to take place. The function is mathematically described as follows :W -1 y( m) = w(i ) x ( m - i ).2 - res _ shift i=0 0 m < W + X -1 In order to ensure correct operation, a number of argument checks are performed, and are as follows :iw_size < 1 ix_size < 1 res_shift < 0 res_shift > 27 The flag EDSP_BAD_ARG is returned, and the function aborted if a bad argument is detected. As is the case with most of the DSPLib examples, optimum performance will be obtained if the program and data are located in on chip memory. Clearly the ability to achieve this depends upon the Microcontroller used, and program size. It is recommended that at the very least one of the input arrays be located in on chip RAM, which will help to avoid bus conflicts. 61 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation By way of an example of complete convolution, if the data set `1234' is to be completely convolved with the set `5678', the following shifts would be performed :00001234000 00000005678 00000056780 00000567800 00005678000 00056780000 00567800000 05678000000 Held static 1st convolution 2nd convolution 3rd convolution 4th convolution 5th convolution 6th convolution 7th convolution Note how the insertion of zeros enable the two data sets to be completely convolved. One data set is shifted relative to the other, and the convolution is taken. This example also indicates how the size of the output array is larger than the input arrays. In the case of ConvComplete, the output array size should be W+X-1. It is essential that this is ensured since failure to do so will result in the corruption of the stack. 5.2.2 ConvCyclic ConvCyclic offers a function which cyclically convolves two data sets and returns the result in a third array. The function is defined as follows :int ConvCyclic( output, iw, ix, size, res_shift) Where, short short short long int output[] iw[] ix[] size res_shift output data input w input x size of both input arrays (N) right shift applied to output Because the two input arrays are cyclically convolved, they must be the same size. It is up to the user to zero pad the smaller of the two arrays if their sizes are different. Cyclic convolution is mathematically described as follows : N -1 y( m) = w(i) x( m - i + N N ).2 - res _ shift i=0 0m 27 62 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation If any of the above are true, the function aborts and returns the flag EDSP_BAD_ARG. As is the case with most of the DSPLib examples, optimum performance will be obtained if the program and data are located in on chip memory. Clearly the ability to achieve this depends upon the Microcontroller used, and program size. It is recommended that at the very least one of the input arrays be located in on chip RAM, which will help to avoid bus conflicts. Clearly the three convolution functions offered by DSPLib perform the same mathematical operation between two sets of data. The way in which they differ is in the manipulation of arrays, and in the results that are included in the output. In the previous example, the two arrays were completely convolved, and zero padded to ensue that all results were obtained. Cyclic convolution is different to that, and is now shown in an example. Assume that the two arrays `1234' and `5678' are two be cyclically convolved. As before, one of the arrays is held static, and the other shifted around it. After each shift, the two arrays are convolved. This is performed until every element is convolved with every other element. E.g. 1234 5678 8567 7856 6785 Held Static 1st convolution 2nd convolution 3rd convolution 4th convolution Hence it should become clear that the output array is the same size as the input arrays. In the above example, 5678 was rotated four times in order to perform cyclic convolution. This is shown in the diagram below, where the outer circle is rotated. 5 1 8 4 2 6 3 7 Rotate Outer Circle Diagram 5.1 :- Cyclic Rotation 63 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation 5.2.3 ConvPartial ConvPartial offers partial convolution between two data sets. The function is essentially the same as ConvComplete, but will not include results that are obtained outside the data sets. The function is defined as follows :int ConvPartial( output, iw, ix, iw_size, ix_size, res_shift) Where, short short short long long int output[] iw[] ix[] iw_size ix_size res_shift output data (size X-W+1) input w input x size of w (size W) size of x (size X) right shift applied to output The input arrays can be different sizes, however, the solution does not include elements derived from outside the array boundaries. The function is mathematically described as follows :W -1 y( m) = w(i ) x ( m + W - 1 - i ).2 - res _ shift i=0 0 m X-W In order to ensure correct operation, a number of argument checks are performed, and are as follows :iw_size < 1 ix_size < 1 ix_size < iw_size res_shift < 0 res_shift > 27 If any of the above cases are true, the function will abort and return the flag EDSP_BAD_ARG. As is the case with most of the DSPLib examples, optimum performance will be obtained if the program and data are located in on chip memory. Clearly the ability to achieve this depends upon the Microcontroller used, and program size. It is recommended that at the very least one of the input arrays be located in on chip RAM, which will help to avoid bus conflicts. When using the function it is important to pay attention to the array sizes in order to avoid stack corruption. 64 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation The inputs can be different sizes, but input w must be larger than input x. The size of the output array is clearly essential, and is defined as X-W+1. By way of an example, the partial convolution method is now explained. Following on from previous examples in this section, we will convolve two data sets, one containing 1234 and the other 56789. Clearly, one array must be larger than the other in order to satisfy the argument checking described earlier. 1234 xxx5678 xx56789 x56789 56789 56789 56789x 56789xx 56789xxx Held static 1st convolution 2nd convolution As can be seen, the larger array is shifted across the smaller one. There are only two instances when the convolutional result is made up from results obtained within both arrays. The x's represent areas where results would be calculated from outside array boundaries, such as in ConvComplete, but in this case are rejected. Hence, in the above example, the output array would contain two elements, which holds true to the equation X-W+1 = 5-4+1 = 2. 5.2.4 Correlate Correlate offers complete correlation between two arrays, and deposits the solution in a third array. Convolution is the same as correlation, only one of the input arrays is in reverse order, i.e. time reversed. The function is described as follows :int Correlate(output,iw,ix,iw_size,ix_size,no_corr, res_shift) Where, short short short long long long int output[] iw[] ix[] iw_size ix_size no_corr res_shift output data (size X-W+1) input w input x size of w (size W) size of x (size X) number of correlations M to compute right shift applied to output The user can control the number of correlations made. Correlation outside the array can be performed, and zeros will automatically be inserted as required by the function. 65 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation The function is mathematically described as follows :W -1 y( m) = w(i ) x (i + m).2 - res _ shift i=0 0m 27 If one of the above is true, the function will abort and return the flag EDSP_BAD_ARG. As is the case with most of the DSPLib examples, optimum performance will be obtained if the program and data are located in on chip memory. Clearly the ability to achieve this depends upon the Microcontroller used, and program size. It is recommended that at the very least one of the input arrays be located in on chip RAM, which will help to avoid bus conflicts. The actual operation of Correlate is shown in an example at the end of this section (5.3). However, in order to clarify the operation of shifting arrays relative to each other during correlation calculation an example follows. Assume that the data set 1234 is to be correlated with 56789. Clearly, by looking at the argument checking, the second array must be larger than the first. The process starts with both arrays in line, and then shifts one relative to the other the user specified number of times. 1234 56789 56789 567890 5678900 56789000 Held static 1st Correlation 2nd Correlation 3rd Correlation 4th Correlation 5th Correlation Clearly the definition of the size of the output array requires care. The array will contain one value for each correlation performed, and therefore should be no_corr elements long. Special care should be exercised here since failure to allocate sufficient elements in the output array will result in corruption of the stack due to the nature of the C programming language. 66 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation 5.2.5 CorrCyclic CorrCyclic offers the same functionality as Correlate, only the input arrays are shifted in a cyclic manner relative to each other. The function is defined as follows :int CorrCyclic( output, iw, ix, size, res_shift) Where, short short short long int output[] iw[] ix[] size res_shift output data input w input x size of input arrays N right shift applied to output The function requires two input arrays of the same size. The two inputs are cyclically correlated and the solution returned in a third array called output. In certain circumstances, the result can be false, in which case Correlate must be used. It is important that the user has a good understanding of the correlation process before using this function. CorrCyclic can be mathematically described as follows : N -1 y( m) = w(i ) x ( i + m N ).2 - res _ shift i=0 0m 27 If any of the above are true, the function will abort and return the flag EDSP_BAD_ARG. For the purpose of development, it is important to test for this flag. However, once a working system is achieved, the argument checking can be removed in order to speed up execution time. As is the case with most of the DSPLib examples, optimum performance will be obtained if the program and data are located in on chip memory. Clearly the ability to achieve this depends upon the Microcontroller used, and program size. It is recommended that at the very least one of the input arrays be located in on chip RAM, which will help to avoid bus conflicts. Since the two data sets are cyclically correlated, the output array should be the same size as the two input arrays. In the example below, the shifts involved in correlating two four element arrays are shown, and it is also shown that the output solution contains four elements. 67 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation The data set `1234' is to be cyclically correlated with the data set `5678'. The shifts involved in this process are as follows :1234 5678 8567 7856 6785 Held static 1st correlation 2nd correlation 3rd correlation 4th correlation The example above should show that the array containing the results is the same size as the input arrays, in this case four elements. It is essential that the user defines an array of suitable size to contain the correlation solutions. Failure to do this will result in the corruption of the stack. 5.3 Correlation Example By way of an example, this section shows how the correlation function can be used to analyse the random properties of the noise generated by GenGWnoise. Sample C code is given, as well as plots of output data. Correlation can be used to determine how random a series of white noise samples are. This test uses the DSPLib function GenGWnoise which is explained in the miscellaneous section of this document. The function generates a series of white noise samples which are claimed to be random. The randomness of the white noise samples can be ascertained by performing auto correlation on the array. This means that the same set of samples are loaded into both input arrays in the Correlate function. The function then performs correlation, and returns a solution in a third array. The plot of this solution can be seen in diagram 5.2 below. The correlation process will highlight likeness between the input arrays. In our case, if the data is truly random, the level of correlation will be very low. The reason for this is that with random noise, each sample is has an equal probability of being positive or negative. Most samples will therefore cancel each other out, and will thus return a low correlation. The one exception to this is the first correlation, i.e. the first element of the output array. At this instance, each element of the input array is being correlated to itself since no shifting has yet taken place. This means therefore that a very high level of correlation will take place, and a large value returned to the first element. Once the data is shifted by one place, the level of correlation will become very low. This can be seen in Diagram 5.2 over the page, and serves to prove the random properties of the data. The white noise was generated by GenGWnoise, and is discussed in the next chapter. 68 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Convolution & Correlation Autocorrelation Of Gaussian White Noise 30000 25000 20000 15000 Series1 10000 5000 0 -5000 1 16 31 46 61 76 91 106 121 136 151 166 181 196 Diagram 5.2 The above plot confirms the random properties of the white noise generated by GenGWnoise. It is clear to see the large amount of correlation when the input arrays were aligned. Once the data was shifted, the correlation remained low indicating the lack of periodicity in the white noise. C Code Implimentation #include #include #include #define NSAM 128 void main(void) { /*** Define Local Variables ***/ short noisea[NSAM], noiseb[NSAM], result[NSAM]; int res_shift, a; float varience; varience = 32767; res_shift = 15; /*** Generate White Noise ***/ if(GenGWnoise(noisea, NSAM, varience) != EDSP_OK) printf("Cannot Generate Noise"); /*** Copy Arrays For Auto Correlation ***/ for(a = 0; a < NSAM; a++) noiseb[a] = noisea[a]; /*** Perform Auto Correlation ***/ if(Correlate(result, noisea, noiseb, NSAM, NSAM, NSAM, res_shift)!= EDSP_OK) printf("Problem With Correlate"); } 69 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions 6.0 Miscellaneous Functions Introduction This section offers the user a number of useful functions which may be used to `glue' certain DSP functions together. All the functions are unrelated, and perform small tasks which the user would be expected to require when implementing a number of DSP operations on an SH Microcontroller. Because of the diverse nature of the functions, software examples have not been provided. It is however hoped that all the functions are reasonably self explanatory, and should cause no problems. 6.1.1 GenGWnoise This function offers a method of generating true Guassian white noise, of a length specified by the user. The noise generated has zero mean, and a user specified variance. The method of generation is described in detail in the related Ensigma text, but essentially uses the rand operator in C, and arranges pairs of samples to a certain criteria. The function is defined as follows :int GenGWnoise( output, no_samples, variance) Where, short long float output[] no_samples variance array containing white noise samples number of output samples required user defined variance of noise distribution 2 In order for the function to operate correctly, the user must define no_samples to be one or greater, and also set the variance to a value greater than 0.0. Failure to ensure this will result in the function aborting and returning the flag EDSP_BAD_ARG. The equations to determine the white noise samples are as follows :o1 = r1 -2 ln( x ) / x o2 = r2 -2 ln( x ) / x The variables r1 and r2 represent two numbers generated by the random number generator rand, between -1 and 1. These two random numbers are then squared and added together. If the sum is less than one, then they will be used in the above equations. If the sum is greater than one, another pair of random numbers will be generated, and tested in the same manner. This process continues until the user specified number of noise samples have been generated. 71 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions Samples are generated in pairs by the equations shown above. If an odd number of samples are required, the second of the two solutions is discarded. It should also be noted that this function isn't suitable for real time operation due to the floating point arithmetic employed. 6.1.2 MatrixMult MatrixMult offers an optimised method of performing matrix multiplication. Two matrices are multiplied together and the result deposited in a third. The output matrix should not be `in-place', i.e. should not conflict with either of the input matrices. The function is defined as follows :int MatrixMult(op_matrix, ip_matrix1, ip_matrix2, no_rows1, no_cols1, no_cols2, res_shift) Where, void void void long long *op_matrix *ip_matrix1 *ip_matrix2 no_rows1 no_cols1 long int no_cols2 res_shift pointer to first element of output matrix pointer to first element of input matrix1 pointer to first element of input matrix2 row dimension of matrix1 (m) column dimension of matrix1 & row dimension of matrix2 (n) column dimension of matrix2 (p) right shift applied to each output The function will fail if any of the dimensions m,n or p are less than one. Also the stated shift applied to the output matrix elements must be greater than zero, and less than twenty eight. If the function fails the flag EDSP_BAD_ARG is returned. Care must be exercised when specifying array size to contain the array solution. It should be noted that void pointers are used to point to the first element of the matrices. When multiplying two matrices, it is essential that they are conformable. That is the column of the first matrix must be the same size as the row of the second. The dimension of the resulting matrix will be m x p. Matrices are stored in the standard C manner, in one array. The elements are ordered as follows : a0 a3 a6 a1 a4 a7 a2 a5 a8 would be represented by the array :- {a0 , a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 } 72 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions In order to ensure maximum efficiency, the user should ensure that matrix1 is located in on chip RAM. Clearly the ability to do this depends upon the SH device being used, as well as other memory allocation considerations. Clearly the function will operate correctly with both matrices located off chip, but bus conflicts may arise which will slow down the execution time. 6.1.3 VectorMult This function offers an optimised routine for multiplying two vectors together. The result of the multiplication is stored in a third array of the same size as the inputs. The output array should be located in a different memory location to both of the inputs, i.e. it should not be in-place. A typical application of this function is to apply a window to a set of data. The function is defined as follows :int VectorMult(output,ip1,ip2,no_elements,res_shift) Where, short short short long int output[] ip1[] ip2[] no_elements res_shift array containing output input array 1 input array 2 number of elements in each array right shift applied to each output The function will abort if the user specifies the number of elements to be less than one. It will also abort if the right hand shift applied is less than zero or greater than sixteen. If the function aborts, the flag EDSP_BAD_ARG will be returned. It is important that the two input arrays used with this function are the same size, and that the no_elements argument is set to this size, or less. Clearly, if the user doesn't want to multiply all the array elements together, the no_elements argument is set to a value less than the input array size. Vector mult can be represented by the following C code. Clearly, if the user wishes to use this type of process, VectorMult should be used since it offers an optimised version compared to that generated by most C compilers. This will be of particular importance in real time applications where execution time is of great importance. short long int output[NSAM], ip1[NSAM], ip2[NSAM]; NSAM = 128; res_shift = 0, a; for(a=0; a < NSAM; a++) output[a] = (ip1[a] * ip2[a]); As can be seen, the function performs element wise multiplication. I.e. it multiplies the first element of ip1 by the first element of ip2 and so on. 73 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions If the user requires the calculation of the dot product of two arrays (every element multiplied by every element), the matrix mult should be used with n set to one. For more details on MatrixMult the reader should consult section 6.1.2. 6.1.4 MsPower MsPower offers a function which will simply calculate the mean square power of an input array presented to it. The result is returned as a long (32-bits) in a single element pointed to by a pointer defined by the user. The function is defined as follows :int MsPower( output, input, no_elements) Where, long short long *output result input[] input array no_elements number of elements contained in input array The function performs simple argument checking, and will return EDSP_BAD_ARG if no_elements is less than one. The function is mathematically described as follows :Mean Square Power = 1 N N -1 x (i ) 2 i= 0 As can be seen the function simply returns the mean of the sum of the squares of the elements in the input array. The result of the division is rounded to the nearest integral value. There is a risk of overflow if the number of elements specified is more than 2321. 6.1.5 Mean This function follows directly from that described above. The function calculates the mean of a set of data presented to it in an array. The single element solution is a short and is pointed to by a pointer defined by the user. The function is defined as follows :int Mean( mean, input, no_elements) Where, short *mean mean of input data x short input[] input array x long no_elements number of elements N in x to process 74 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions The function performs basic argument checking, and will return the flag EDSP_BAD_ARG if the number of elements N specified is less than one. The function is mathematically described as follows :x= 1 N N -1 x (i ) i=0 The result of the division is rounded to the nearest integral. If the user specifies a number of elements that is larger than 216-1 then the possibility of overflow exists. 6.1.6 Variance This function offers the user a routine to calculate the variance of a set of input data. The user presents the function with an array containing the data to be analysed, and the function will then return both the mean of that data and also the variance. The function is defined as follows :int Variance( variance, mean, input, no_elements) Where, long *variance short *mean short input[] long no_elements variance of the input array 2 mean of the input array x input data x number of elements N in input array Basic argument checking is performed, and the function will abort and return the flag EDSP_BAD_ARG if the number of elements specified is less than one. Mathematically, the function performs two tasks. Firstly it calculates the mean of the data, and makes this solution available to the user. Secondly, the variance is calculated using the solution to mean calculated previous. The equations for mean and variance are as follows :mean = x = var ience = 2 = 1 N 1 N N -1 x(i) i=0 N -1 ( x (i ) - x ) 2 i=0 In each case, the results of the divisions are rounded to the nearest integral value. The result of mean is returned as a short which is 16-bit. Variance solution is returned as a long 32-bit value. There is a possibility of mean overflowing if the number of elements is greater than 216-1. The variance calculation is not checked for overflow. 75 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions 6.1.7 MaxI This function will locate the maximum value in an array presented to it, and will then return a pointer to that value. It does not return the value itself. Maxl is defined as follows :int Maxl( max_ptr, input, no_elements) Where, short **max_ptr address of pointer to max element short input[] input array to be searched long no_elements number of elements in input to be searched The function will abort and return the flag EDSP_BAD_ARG if the user specifies less than one element in the input array. The address of the element with the maximum value is returned in max_ptr. If two elements or more have the same maximum value, the address of the element nearest to the start of the array will be returned. 6.1.8 MinI Minl compliments the above function in that it returns the address of the lowest element in the input array presented to the function. The actual minimum value is not returned, but instead a pointer to it. The function is defined as follows :int Minl( min_ptr, input, no_elements) Where, short **min_ptr address of pointer to min value short input[] input array to be searched long no_elements number of elements to be processed The function will return the flag EDSP_BAD_ARG if the user specifies a number of elements that is less than one. As with the above function Maxl, if there are two or more min elements the function will return a pointer to the element that is closest to the start of the array. Timings for these functions are impossible to quote since the total execution time depends upon the number of elements in the array being searched. One method of further improving the execution speed is to remove the argument checking, as long as the number of elements in the input array is greater than one, this will not cause and problem, nor compromise the reliability of the function. 76 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) Miscellaneous Functions It should be noted that removal of argument checking is only recommended on simple functions such as this one. For more complicated functions, the argument checking will provide an effective tool for debugging processes. 6.1.9 Peakl This final function is simply a variation on Maxl. The difference with this function is that it will return the address of the maximum absolute value in the input array. I.e. it looks for the largest value irrespective of sign. As an example in an array of two elements {30,-50} Peakl would return a pointer to the -50 element. The function is defined as follows :int Peakl( peak_ptr, input, no_elements) Where, short short long **peak_ptr address of pointer to the peak element input[] input array to be searched no_elements number of elements in input to be processed The function performs basic argument checking and will return the flag EDSP_BAD_ARG if the user specifies the number of elements to be less than one. The process can be speeded up by removing the argument checking, but the reader should first read the note at the base of the previous page. As with the previous two functions, if there are two elements or more that have the same absolute peak value, the function will return the address of the element that is closest to the start of the array. 77 Hitachi Micro Systems Europe Ltd DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation 7.0 DSPLib Implementation Introduction This final section of the application notes sets out to give a typical example of how a working system based around the SH1 processor can be implemented. By the very nature of DSP, it is very unlikely that this example will exactly suit the needs of the reader due to the great diversity of digital signal processing. However, the aim of this section is to give the reader some ideas about hardware set-up, as well as programming around the functions offered by DSPLib. Furthermore, it is hopped that consultation of this section will speed up the design process, and help to avert possible time consuming problems. Implementation can be split into two distinct areas, hardware and software. The hardware design considerations are considered first, followed by some typical software examples to tie in. Clearly, any software examples given will be fairly hardware dependant, but should help the reader to understand the process of interfacing DSPLib with the real world. 7.1 Hardware The hardware implementation of a DSP process is greatly dependant upon the process it is intended for. Having said that, every DSP system can be broken down and shown to contain at least two if not all of these three distinct sections :Analogue To Digital Conversion Digital Processing Digital To Analogue Conversion A system containing two of the above sections is the compact disc player for example. The data is already in digital format, and so requires no analogue to digital conversion. Conversely, a mobile telephone contains all three, with analogue data coming from the microphone, and also being sent to the speaker. The hardware example given in the following text contains all three sections, and is capable of performing digital processing on real analogue signals, and returning the solutions as real analogue signals. It is important for the reader to remember that the data conversion processes are not simply digital to analogue or analogue to digital converters, but instead require the careful consideration of the use of sampling rates, and also filtering. In the data conversion process, the use of anti-aliasing and anti-imaging filters are required. Since it is the purpose of this document to show how the Hitachi SH-1 processor can be interfaced, notes on filter design are not included. Hitachi Micro Systems Europe Ltd 79 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation The filters used in a typical application take the form of a low pass network with a Butterworth response. The filter order chosen depends upon the acceptable level of aliasing/imaging but would typically be around 6th order implementations. 7.1.1 Analogue to Digital Conversion This process consists of three distinct blocks, and is shown below. The real world analogue signal is firstly bandlimited by the anti-aliasing filter. The purpose of this filter is to prevent signals that are near or above the Nyquist frequency (twice the sampling frequency) passing into the sampling device. As discussed above, the filter would normally take the form of a low pass network with Butterworth response. The band limited signal is passed into the analogue to digital converter. In the case of this example, the on chip ADC has been used, which produces a ten bit representation of the analogue data. The ADC is controlled by a sample rate generator. The sample rate dictates the ADC conversion process, and should be chosen to be at least twice the highest frequency component present in the input. In this particular example, the bandwidth of the input signal is 3.4Khz, and the sampling rate was chosen to be 8Khz. This system should in part help to mimic telephone based systems which have similar specs. There are many methods open to the user for sample rate generation. In order to gain maximum efficiency, it is suggested that the sample rate be synchronised to the CPU clock. For the purposes of this example, the CPU clock has been divided by external hardware dividers, eventually arriving at a frequency of 7.8Khz, which is an acceptable rate for the sampling of the given signal bandwidth. Analogue Input Anti-Aliasing Filter (6th Order) Band Limited Analogue Signal Analogue To Digital Converter 10-bit Data Out Fs CPU Clock Sample Rate Generator Diagram 7.1 :- Analogue Interface As mentioned earlier, the Hitachi SH-1 processor contains an on chip ADC. This device gives ten bit resolution, and helps to simplify the implementation of a real time system. Clearly if the user requires a higher resolution, an external ADC should be used. The device could be memory mapped, and accessed in the same manner as RAM. Notes on using the internal ADC are included in the software section of this document. For further information on the SH-1 peripherals, the reader should consult the SH7032/SH7034 hardware manual. Hitachi Micro Systems Europe Ltd 80 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation Once the data is converted into digital form, it is suitable to present to the DSPLib routines. The actual interface software used to load data in, and also write data out at a rate of 8Khz is described in the software section. 7.1.2 Digital To Analogue Conversion Once the digital processing is complete, it is necessary to convert the descrete time domain samples returned by DSPLib into a continuous analogue waveform. This process involves two stages, and is rather obviously the reverse to the analogue conversion process described above. Because the SH-1 series of microcontrollers don't have on chip DACs, the user must interface a device to it. The actual specific details of how this can be achieved are included in the circuit description section of this report. Keeping to a more general form, the best location for a DAC on the SH series of devices is on the data bus. By locating the DAC on the memory map, the device can be accessed exactly like a piece of RAM. The access time to the DAC can be controlled by the insertion of wait states, and these are discussed fully in the software description section at the end of this report. The use of serial data is very popular in many DSP systems today. The SH-1 has two bi-directional serial ports which could be used to interface to serial DACs as well as ADCs. It should however be noted that the respective converters would most probably not interface directly to the serial port, and would infact require some `glue' logic. The main reason for this is that the serial protocol used by the SH series would probably not be suitable for the converter being connected. The use of some `glue' logic would enable the associated start and stop bits to be removed/modified inorder to satisfy the chosen serial device. When converting data from digital to analogue, it is essential that the resulting waveform is filtered in order to remove most imaging frequencies. The imaging frequencies are generated as a result of the steps that are produced by the DAC. These steps are referred to as quantisation, and reflect the accuracy of the converter. As the converter can only output one of say 1024 levels (10-bit), the waveform is stepped in between. The sharp edges of the steps create many harmonics which roll off with a sinex/x function in the frequency domain. The purpose of the anti-imaging filter is to allow frequencies only in the bandwidth of interest to be passed, thus stripping the signal of the unwanted quantisation steps. It is specific to the user to specify the response of the filter. The filter is normally low pass, and exhibits Butterworth characteristics. In the case of the example system described in this section, a 6th order filter was chosen, with a 3dB bandwidth of 3.4Khz. For both the ADC and DAC process, use was made of a virtual ground. The purpose of this process is to make the signal presented to the ADC purely positive. The DC offset introduced was then removed by software. Hitachi Micro Systems Europe Ltd 81 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation Once the DSP operation is complete, the DC offset is re-introduced, and the purely positive output signal then sent to the DAC. The output of the DAC is de-coupled in order to remove the signals DC content. By using this method, a single quadrant converter can be used at both ends, thus helping to keep system cost to a minimum at the expense of only a few clock cycles. 7.1.3 Circuit Description The reader should consult the enclosed schematic of the example system hardware when reading the following description, which can be found at the end of this section. The input signal is applied on the ADC_In line on the left of the schematic. This signal has been band limited to 0 - 3.4Khz in order to prevent aliasing occurring. Once applied, the signal is de-coupled by C1 which has the effect of removing any DC component present in the signal. Next the input is passed through a unity gain buffer. This stage is included to isolate the filter stage from the ADC process. It is important to ensure that the final stage of the anti-aliasing filter is not loaded in any way, which could in turn cause the filter response to be altered. By using a buffer, the final stage of the filter will `see' a high impedance, thus removing any risk of loading. The output of the buffer is fed back to the inverting input, which forces the amplifier to act as a voltage follower. Once buffered, the signal is once again decoupled by C2, which removes any DC introduced by the amplifier. The next stage concerns the introduction of a 2.5 volt DC offset or `virtual ground'. The purpose of this process is to ensure that the signal applied to the ADC is purely positive. Up to this point, the input signal has been around five volts peak to peak, centred around zero. Now, a DC offset is added to the signal, which centres the signal around 2.5 volts, shifting the whole signal positive. This will now ensure that the whole signal is sampled with no loss of information. After sampling, the DC offset is removed using software techniques. This process is described in the software section of this document. The DC offset is generated by a dedicated virtual ground generator device U8. There are a number of these devices on the market, and any three pin type should be suitable. It is important to remember that the reference should be as noise free as possible, as noise here is introduced into the signal. The use of a bandgap type generator should ensure low noise operation. It is important that a zener device is not used as these are very noisy and will degrade the systems signal to noise ratio. Prior to application to the ADC, the signal is passed through an over voltage protection circuit provided by D1 and D2. If the input is between 0 and 5 volts, the diodes have no effect on the signal. However, if the voltage should rise above the supply rail, the diode will have the effect of clamping the signal to the supply. This is also true if the signal should fall lower than zero, diode D1 will clamp the input to ground. This arrangement is included to offer some protection over the SH-1 on chip ADC. Hitachi Micro Systems Europe Ltd 82 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation After the input protection, the signal is in suitable form to be applied to the SH-1 ADC. This ADC has eight multiplexed inputs, and in the case of this example, input AN0 has been chosen. For more information on the SH-1 hardware configuration, including the ADC, the reader should consult the SH7032/SH7034 Hardware Manual. The SH-1 ADC can be configured such that it is triggered externally, as is the case in this example. The trigger is applied to the ADTRG line located on pin 63. The trigger is a square wave with 50% duty cycle, and conversion starts on each falling edge. The trigger signal is derived from the 16Mhz CPU clock which is available from the CK line on pin 71. This output is derived from the system crystal, and has been cleaned up and passed through a duty cycle correction circuit. The 16Mhz signal is repeatedly divided by three four bit ripple counters U4, U5 and U6. The end effect is 16Mhz / 2048 = 7.8Khz. The resultant signal from U6 is used to start the ADC. An alternative method for sample rate generation is to program the SH-1 timing pattern generator. Details on this subject may be found in the SH-1 Hardware Manual. The final section of hardware to explain is the digital to analogue conversion. This is achieved by way of a memory mapped 12-bit device U2. Although a twelve bit device has been chosen for this example, the output resolution is only really 10-bit due to the 10-bit ADC used. If the reader wanted a 16-bit system, the method of connection is identical, only the device would use the first sixteen data lines on the bus instead of the first twelve in this case. Because the device is memory mapped, a small amount of address decoding logic is required. The SH-1 has most of the address decoding provided on chip, and the memory map for the device can be seen in its associated hardware manual. In the case of this example, the DAC is located in area 6 which starts at address E000000 and ends at EFFFFFF. When an address within that area is accessed, the line CS6 falls low. In the case of a write operation, which is what happens with the DAC, the WR pin falls low as well. U3 controls the chip select on the DAC, by ensuring that the CS pin on the DAC falls low when both the CS6 line and the WR line are low. By the nature of the real time operation of the system, the CS pin on the DAC should fall low at the same rate as the ADTRG pin, in this case 7.8Khz. The width of the write cycle can be controlled by the SH-1 wait state controller. Correct programming of this is important to ensure that the write cycle is long enough to be seen by the DAC. In the case of the AD668 device shown, a write cycle of length 125nS was used, which equates to two CPU clock cycles. The DAC used in this example is an Analog Devices AD668. This device is suited to direct connection to CPU buses due to its data latch arrangement. It is important that the power supply arrangements are noted, a split rail supply of 16 volts is required. For more details on this device, the appropriate data sheet should be consulted. Hitachi Micro Systems Europe Ltd 83 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation The output of the DAC is 0 - 5volts and will contain a large amount of quantisation noise. The removal of the DC offset can be achieved by de-coupling the signal, before passing it to a suitable Butterworth low pass filter arrangement. 7.2 Software This section describes the relatively straight forward software routines required to pass real time data through the system outlined in the earlier hardware description. The routines are specific to the hardware configuration detailed, but are written such that modification for any application should be quite simple. The main program is written in ANSI C, and in turn calls a number of functions. For optimum performance, these functions have been written in assembler. The assembler format shown in the examples below is SH series, and will work with the relevant Hitachi tools. It should be noted that the code will not assemble with the GNU tools that are available for the SH series. In order to achieve this, the format of the code will have to be changed as appropriate. 7.2.1 Main() This is the main program which as described calls the other routines in turn. The code can be seen below. #include #include #include #include "coeff.dat" #define NSAM 1 #define no_sections 1 void void void void system(void); dac(short*); adc(short*); detect(short*); void main(void) { short *pointer; short sample; /* Data From ADC */ short output; /* Data For DAC */ short *result; /* Start DSP operation Flag */ short *work; int end = 0; pointer = &sample; result = &output; system(); /* Initialise System (Wait State = 1)*/ Hitachi Micro Systems Europe Ltd 84 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation /** Filter workspace **/ if(InitIir(&work, no_sections) != EDSP_OK) printf("Problems ....."); /** Set Up ADC And DMAC **/ adc(pointer); /** Real Time Loop **/ do { detect(pointer); /** Wait For End Of DMA **/ Iir1(result,sample,coeff,no_sections,work) dac(result); /*Output Data To Memory Mapped DAC */ } while(end == 0); } Following the above C code, the operation is as follows :The code implements an IIR filter using the function Iir1 from DSPLib. The first part of the code is the include statements, and as well as the expected header files, the ensidsp.h is included. This header file contains all the function definitions for DSPLib, as well as some frequently used constants. At this point, the coefficients for the IIR filter are included as well in the file coeff.dat. After some function prototypes and constant declarations, the code starts proper. The first function call is to system(). This function is not part of DSPLib, but is instead a small section of code which sets up the hardware on the SH-1. This function is described in detail in section 7.2.2. Following on from system(), the IIR filter workspace is initialised. This function is described in section 4.2.3. In order for this function to operate correctly, it is essential that the memory sections are defined properly in the linker command file. For further information on the Hitachi linker, the reader should consult the H series linkage editor users manual. Next in the code, the on-chip ADC is initialised. This function adc() is described in section 7.2.3, and requires a pointer which is used to point to the sampled value. The function makes use of the DMA (Direct Memory Access) controller. When the ADC has completed its conversion, the result is moved into a memory location pointed to by the pointer provided by the user in one clock cycle. Now we enter the real time loop. Until now, the code has been set-up, and only executed once. Inside this loop, the processor performs sample by sample computation. The first function we meet is called detect(). This function forces the processor to wait for the end of the DMA operation before continuing the operation. Hitachi Micro Systems Europe Ltd 85 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation This ensures that the DSP operation is only performed on a new sample. The function is described in section 7.2.4. Once the end of the DMA operation has been detected, the DMA controller is reset, and the DSP operation can take place on the single sample received. In order to achieve real time operation, it is essential that the DSP task ends before a new sample arrives. When the DSP task is complete, the function dac() is used to output the result to the memory mapped DAC described in the hardware section of this document. The function dac() is described in detail in section 7.2.5. 7.2.2 system() The H series assembler for this function is as follows :_system: ;Initalise Wait States For DAC ;Set Up Pin Function Controller MOV.L MOV.L RTS MOV.W PACR1,R3 SETUP,R2 MOV.L MOV.L RTS MOV.W WCR3,R3 WAIT,R2 ;Enable ADTRG Pin R2,@R3 ;One Wait State R2,@R3 .ALIGN 4 WCR3 PACR1 SETUP WAIT .data.l .data.l .data.l .data.l H'05FFFFA6 H'05FFFFC8 H'0000330A H'00008800 .END This function performs two tasks. The first task is to set up the pin function controller on the SH-1 such that the ADTRG pin is selected. This will allow the on chip ADC to be externally triggered via that pin. This process is achieved by setting up the PACR1 register. Finally, the function sets up the wait state controller such that the write cycle to the external DAC is of a sufficient period. In this particular example, the write cycle is two CPU clock cycles long. The register WCR3 was used to control this task. Details on both the registers mentioned here can be found in the SH-1 hardware manual. Hitachi Micro Systems Europe Ltd 86 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation 7.2.3 adc(short*) The SH assembler listing for this function is as follows :_adc: ;Set Up DMAC MOV.L MOV.L MOV.L MOV.L MOV MOV.L MOV.L MOV MOV.W MOV.L MOV.L MOV.w SAR0,R1 ADDRA,R2 R2,@R1 DAR0,R1 R4,R2 R2,@R1 TCR0,R1 #H'01,R2 R2,@R1 CHCR0,R1 DATA,R2 R2,@R1 ;Start DMAC MOV.L MOV MOV.W DMAOR,R1 #H'01,R2 R2,@R1 ;Set Up ADC MOV.L MOV MOV.B MOV.L MOV RTS MOV.B ADCSR,R1 #H'048,R2 R2,@R1 ADCR,R1 #H'0FF,R2 R2,@R1 ;Register Vectors DATA ADDRA ADCSR ADCR SAR0 DAR0 TCR0 CHCR0 DMAOR .ALIGN 4 .data.l .data.l .data.l .data.l .data.l .data.l .data.l .data.l .data.l H'0000D09 H'05FFFEE0 H'05FFFEE8 H'05FFFEE9 H'05FFFF40 H'05FFFF44 H'05FFFF4A H'05FFFF4E H'05FFFF48 .END This function performs several tasks. When calling the function, the user provides a pointer to a variable. This pointer is used by the function to set up the DMA controller such that the ADC value is transferred to it in one clock cycle. The setting of the DMA is quite intricate. Hitachi Micro Systems Europe Ltd 87 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation It is therefore recommended that the reader consult the SH-1 hardware manual DMA section. The registers seen above can be followed, and set up to suit a particular application. When the DMA set up is complete, the DMA controller is started. This now means that when the ADC generates an interrupt at the end of a conversion, the DMA will transfer the data to the destination address in one clock cycle. The final part of this code sets up the internal ADC. The options chosen include an external trigger option and also an interrupt generation which in turn starts the DMA controller. From now onwards in this code, the ADC will always start converting when the external trigger pin falls low. After each DMA operation, some registers need resetting. This task is performed by the detect() function described in the next section. 7.2.4 detect(short*) The detect routine forces the process to wait until the input data is valid, i.e. a new sample. The assembly listing of this function is as follows :_detect: CLRT MOV.L CHCR0,R1 LOOP: MOV.W @R1,R2 EXTU.W R2,R0 TST #H'02,R0 ;Test bit goes to one if AND is zero BT LOOP ;Convert Sampled Data pointed to by R4 MOV.W @R4,R1 EXTU.W R1,R2 SHLR R2 SHLR R2 SHLR R2 SHLR R2 ADD #H'03,R2 MOV.L DATA,R1 SUB R1,R2 MOV.W R2,@R4 ;Start DMA And ADC Again ;Reset TCR0 & CHCR0 & DMAOR MOV.L MOV.L MOV.W MOV.L MOV MOV.W MOV.L MOV MOV.W CHCR0,R1 DATA1,R2 R2,@R1 TCR0,R1 #H'01,R2 R2,@R1 DMAOR,R1 #H'01,R2 R2,@R1 Hitachi Micro Systems Europe Ltd 88 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation ;Reset ADCSR MOV.L MOV RTS MOV.B ADCSR,R1 #H'048,R2 R2,@R1 .ALIGN 4 DAC TCR0 CHCR0 DMAOR DATA1 ADCSR DATA .data.l .data.l .data.l .data.l .data.l .data.l .data.l H'0E000000 H'05FFFF4A H'05FFFF4E H'05FFFF48 H'00000D09 H'05FFFEE8 H'000008A3 .END The first part of this code consists of a loop which tests for the end of a DMAC operation. The program will stay in that loop until the DMA has finished, then come out and execute the remaining code. The end of a DMA means that a new sample is ready for processing. By making the CPU wait for a new sample, the risk of processing the same sample twice is removed. Next, the new sample is converted into the correct form ready for presentation to the DSP task. The sample from the on chip ADC sits in the upper 10 bits of a sixteen bit word. The ten bits are therefore shifted right by four places and the two least significant bits set to one. The result of this process is that the data is now in 12 bit form suitable for the DAC. Next, the DC offset introduced by the hardware is removed by a single cycle subtraction. If the chosen DSP task is operating correctly, it should return a 12 bit solution which can have an offset applied to it, and then sent to the DAC. That process is described in section 7.2.5. Once the data is in suitable form, the DMA controller is reset and started again. The transfer count register is reset, and the controller is started again. This means that a DMA operation can take place during the DSP task, and the new sample is ready for processing immediately. Finally, the ADC is started again by writing to the ADCSR register. For full details on the setting and controlling of SH-1 peripherals, the SH-1 hardware manual should be consulted. 7.2.5 dac(short*) This final routine simply takes the solution returned by the DSP process, and writes it to the memory mapped DAC. The SH assembly listing for this function is as follows :- Hitachi Micro Systems Europe Ltd 89 DSPLib For SH Application Note (19-031/1.0) DSPLib Implimentation _dac: ;Output Value To DAC MOV.L MOV.W MOV.L ADD RTS MOV.W DAC,R3 @R4,R2 OFFSET,R1 R1,R2 R2,@R3 .ALIGN 4 DAC OFFSET .data.l H'0E000000 .data.l H'00000800 .END As can be seen, the DAC is located at address E000000. This location is in area 6, and ties in with the hardware description earlier. The value returned by the DSP process has a DC offset added to it. The offset is hex 800, which corresponds to half of the twelve bit output. Therefore, with this offset, the maximum output swing is achievable. Half of full scale corresponds to 2.5volts, which as the hardware section will confirm, is the same as the offset applied to the input. Once the offset is applied the value is written to the DAC as if it were a piece of memory. The write process has already been set-up in the system() function described in 7.2.2. When the value has been written, the program counter returns to detect() and waits until a valid sample is in memory, and then the process starts again. 7.3 Timing When writing code for a real time process it is essential that the operation stays within the allotted time space. For the system described in this section, the processing time can be calculated as follows. CPU = Sample Rate = 16MHz 7.8 KHz Between each sample there is 128uS. This time corresponds to 2048 CPU clock cycles. From this total, the IO operations steal cycles as does the conversion of data. For optimal execution speed DO NOT compile the ANSI C functions included in DSPLib. These are included for reference only, and will not lead to an efficient implementation of a given DSP task. The reader should use the optimised source code in the optsrc directory in DSPLib, or alternatively simply use the ensigma.lib during the link operation, which is in turn built from the optimised code. Hitachi Micro Systems Europe Ltd 90 DSPLib For SH Application Note (19-031/1.0) Appendix One Appendix One Ensigma Software Guide For DSPLib Hitachi Micro Systems Europe Ltd 92 DSPLib for SH User Guide E. Andrews Sept 15, 1995 c Ensigma Ltd 1995 Copyright c Hitachi Micro Systems Europe Ltd 1995 Copyright Page 2 of 87 Disclaimer When using this document, keep the following in mind, 1. This document may, wholly or partially, be subject to change without notice. 2. All rights are reserved: No one is permitted to reproduce or duplicate, in any form, the whole or part of this document without Hitachi's permission. 3. Hitachi will not be held responsible for any damage to the user that may result from accidents or any other reasons during the operation of the user's unit according to this document. 4. Circuitry and other examples described herein are meant only to indicate the characteristics and performance of Hitachi's semiconductor products. Hitachi assumes no responsibility for any intellectual property claims or other problems that may result from applications based on the examples therein. 5. No license is granted by implication or otherwise under any patents or other rights of any third party or Hitachi, Ltd. 6. MEDICAL APPLICATIONS: Hitachi's products are not authorised for use in MEDICAL APPLICATIONS without the written consent of the appropriate ocer of Hitachi's sales company. Such use includes, but is not limited to, use in life support systems. Buyers of Hitachi's products are requested to notify the relevant sales oce when planning to use the products in MEDICAL APPLICATIONS. Page 3 of 87 Contents 1 Introduction 6 2 Installation 8 3 Data Formats 10 4 ANSI C Library 11 5 Eciency 12 6 Test and Example Programs 13 7 Fast Fourier Transforms 15 FftComplex : FftReal : : : : ItComplex : ItReal : : : : FftInComplex FftInReal : : ItInComplex ItInReal : : LogMagnitude InitFft : : : : FreeFft : : : : 8 Windowing GenBlackman GenHamming GenHanning : GenTriangle : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 19 20 22 23 25 26 28 29 31 32 33 34 35 36 37 38 Page 4 of 87 9 Filters Fir : : : Fir1 : : Iir : : : Iir1 : : : DIir : : : DIir1 : : Lms : : Lms1 : : InitFir : InitIir : InitDIir : InitLms : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 10 Convolution and Correlation ConvComplete : ConvCyclic : : ConvPartial : : Correlate : : : : CorrCyclic : : : 11 Miscellaneous GenGWnoise MatrixMult : VectorMult : MsPower : : : Mean : : : : : Variance : : : MaxI : : : : : MinI : : : : : PeakI : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 39 42 44 46 48 50 52 54 57 60 61 62 63 64 66 68 69 71 73 74 75 77 79 80 81 82 83 84 85 Page 5 of 87 A Contents of Distribution Disk 86 References 87 Page 6 of 87 1 Introduction This document describes DSPLib-SH, a library of signal processing routines developed for use with Hitachi's SH RISC processors. Two versions of the library are provided. In the rst version all of the timecritical routines are C-callable, but have been written in SH assembler to minimise execution time whilst maintaining accuracy. The second version of the library contains no assembly code. All routines are written in portable ANSI-C and aim to give results that are bit identical to the optimised assembly library. These libraries allow a development methodology beginning with simulation and program development on a workstation using the ANSI-C library. Development will then transfer to the target hardware, using the optimised version of the library, where hardware dependencies and real-time issues can be resolved. This methodology enables designers to gain the bene ts of using ecient hand-coded software for the time-critical sections of their programs, whilst retaining the bene ts of developing software written in C'. Five groups of routines are included in DSPLib-SH| Fast Fourier Transforms Windowing Functions Filters Convolution and Correlation Miscellaneous The routines are all re-entrant. That is, they do not use any local static variables. This enables them to be used in a multi-tasking environment, or to be called from interrupt routines without aecting their use in the main program. Timings are given for the most time-critical routines. The timings are for a 10 MIPS SH-1, with all internal program and data accesses taking a single Page 7 of 87 processor cycle over the 32 bit memory bus. Timings for other clock speeds can be derived by scaling those given here. The routines provided in this library have been optimised for the SH-1 processor. SH-2 and SH-3 processors will also bene t from DSPLib-SH, but there will be some dierences in execution times compared to the SH-1. Page 8 of 87 2 Installation DSPLib-SH is shipped on a single 3 12 " MS-DOS disk, the contents of which are listed in Appendix A. It comprises a C header le, ensigma.h, the library routines themselves and examples of the library's use. The optimised library is in the library directory, in the pre-assembled object library ensigma.lib. The source for the optimised and ansi-C libraries are supplied in the optsrc and ansisrc Examples showing the use of each routine are provided in the examples directory on the disk. The header le and libraries should be installed in a directory visible to the C compiler and linker respectively. It is recommended that the same directories as the compiler support les is used. Assuming that this directory is C:nshcnshc2.0 and the oppy disk drive is A:, the installation procedure on an MSDOS PC is as follows| 1. Make a backup of the distribution disk. 2. Copy DSPLib-SH onto the system disk. Put the distribution disk into drive A: and enter the following: n n XCOPY A:\* C: shc shc2.0 /S 3. Installation is now complete. Should it be necessary to rebuild the optimised library from source| 1. Change to the optsrc directory. 2. Edit make.bat and lbr.sub to check that the con guration matches your system. 3. Type make. 4. ensigma.lib has now been rebuilt. For the C version of the library, the build process is both operating system and compiler dependent. For a Unix operating system, the procedure is as follows| Page 9 of 87 1. 2. 3. 4. 5. Copy DSPLib-SH onto the system disk. Change to the ansisrc directory. Edit makefile to correspond to your system. Type make. ensigmaC.a has now been built. Page 10 of 87 3 Data Formats The majority of calculations performed in DSPLib-SH use xed point arithmetic. Both integer and fractional number representations are used; these formats are discussed below. Integer arithmetic represents numbers with the decimal point xed at the right of the LSB (least signi cant bit). A signed 16 bit integer variable can represent values in the range [-215:215{1]. When two signed 16 bit integers are multiplied, the signed 31 bit product is aligned at the LSB. If the result is stored in a 16 bit integer numerical over ow may occur. Fractional arithmetic is also useful, especially in DSP applications. Here the decimal point is xed immediately to the right of the most signi cant bit (MSB) for signed numbers (or immediately to the left of the MSB for unsigned numbers). Signed 16 bit fractions can represent values in the range [-1:1-2,15 ]. When two signed 16 bit fractions are multiplied the signed 31 bit product is aligned at the MSB. If the result is stored as a 16 bit fraction numerical under ow may occur. DSPLib-SH makes extensive use of both integer and fractional arithmetic. To maximise the exibility of the library many of the routines allow the alignment of each output to be speci ed. For example, calling Fir with res shift = 0 corresponds to LSB aligned (integer) lter coecients. When res shift = 15 the processing corresponds to MSB aligned (fractional) lter coecients. For most possible values of res shift, there is a possibility of numerical over ow. Except where stated otherwise, DSPLib-SH detects this condition and saturates the value to the appropriate full-scale value. This minimises the error introduced, at the expense of a small increase in processing time when over ow occurs. Implementation The C short and int types provide xed point arithmetic. On the SH, signed 16 bit integers correspond to short and signed 32 bit integers correspond to long or int. Page 11 of 87 4 ANSI C Library An ANSI-C version of DSPLib-SH is provided in A:ndsplibshnansisrc. This version is supplied to ease application development, allowing the early stages to be performed without the constraints imposed by particular hardware. The C library and optimised SH-1 library give bit identical results for most valid inputs, for all routines except the FFT and double precision lter routines, where errors in the least signi cant bits may occur. The exception to this rule occurs when the 42 bit accumulator (MACH/MACL) over ows. In this case the assembler library on an SH-1, the assembler library on an SH-2 and the C library on a host can all give dierent results. However, this can only happen with input vectors containing more than 1023 samples, so it is extremely unlikely in any practical use of DSPLib-SH. Compiler Restrictions The C library must be build with an ANSI C compiler, with the restriction that variables of type double allow the exact representation of 42 bit integers, and the pointer representation must require four or fewer bytes. In practise all Unix and MSDOS ANSI C compilers are suitable as they use 8 bytes to hold double variables, typically allowing the exact representation of integers of up to 53 bits. Page 12 of 87 5 Eciency The routines provided by DSPLib-SH have been hand optimised for maximum speed on SH-1 processors. In many cases the library routines approach the theoretical optimum performance of the architecture, with minimum instruction counts and minimal pipeline con icts. As SH is a RISC architecture, it is relatively straightforward to maximise the eciency of the library based on the following two features: The assembly routines provided by DSPLib-SH have been written so that most program fetches do not cause processor stalls, as long as the program memory supports 32 bit reads; furthermore, most of the processing in DSPLib-SH operates on 16 bit data. Therefore, when de ning the memory map of a target system, the following two recommendations should be obeyed whenever possible: the program code segment should be located in memory that supports single cycle 32 bit reads, and the data segment should be located in memory that supports single cycle 16 (or 32) bit reads and writes. If there is sucient internal 32 bit memory, this would be a suitable location for the library code and data. If other memory must be used the recommendations above should be followed whenever possible | because dierent devices in the SH family contain dierent types of memory and support dierent caching schemes, it is not possible to provide completely general advice. Page 13 of 87 6 Test and Example Programs Test programs for all library routines are provided in the examples directory on the distribution disk. These carefully test each library routine in turn. Among other things they test for correct argument checking and perform a number of bit-exact tests comparing the results with precalculated data. These tests also illustrate the use of the library routines. The test functions are supplied in the subdirectory test. Test data is in the testdat subdirectory. The main test les are| testfft.c testwin.c testfilt.c testconv.c testmisc.c FFT test functions. Window test functions. Filter test functions. Convolution and correlation test functions. Miscellaneous test functions. These are built using the make utility. Typing make all builds testfft.abs and testfft.mot etc. Alternatively individual test functions may be made by typing eg. make testconv.abs. In addition to the individual test les and function library, the following les are provided| testhead.h link.cmd c0.src make.bat makefile Prototypes for test functions. Linker subcommand le based on the SH7030 memory map. C startup le. MSDOS build commands. (Unix) make le. A simple demonstration program that illustrates the use of common library functions is provided in the directory demo. This illustrates the use of t, window, lter and miscellaneous functions. The demonstration program is built using the following les: Page 14 of 87 demo.c demoio.c demoio.h link.cmd c0.src make.bat makefile The demonstration program. Data input/output functions. These functions simulate the data interfaces that would be present in a complete system. Function prototypes for demoio.c. Linker subcommand le based on the SH7030 memory map. C startup le. MSDOS build commands. (Unix) make le. Page 15 of 87 7 Fast Fourier Transforms Contents| FftComplex FftReal ItComplex ItReal FftInComplex FftInReal ItInComplex ItInReal LogMagnitude InitFft FreeFft Compute not-in-place, complex FFT. Compute not-in-place, real FFT. Compute not-in-place, complex inverse FFT. Compute not-in-place, real inverse FFT. Compute in-place, complex FFT. Compute in-place, real FFT. Compute in-place, complex inverse FFT. Compute in-place, real inverse FFT. Convert complex FFT output to log magnitude format. Generate FFT lookup tables. Release FFT lookup table memory. These functions calculate the classical forward and inverse Fourier transforms, with a user de ned scaling. The forward transform is de ned by N X yn = 2,s e,2jn=N :xn n=0 where s (shift) is the number of stages where an additional halving is performed (see the Scaling section below) and N is the number of complex points. The inverse transform is de ned by N X yn = 2,s e2jn=N :xn n=0 Page 16 of 87 Routine Timings The execution time in milliseconds of the FFT routines on a 10 MIPS SH-1 are| Size 128 256 512 1024 Not in placey In place FFT Inverse FFT FFT Inverse FFT Cmplx Real Cmplx Real Cmplx Real Cmplx Real 1.94 1.06 2.19 1.20 1.94 1.06 2.20 1.19 4.43 2.37 4.94 2.66 4.46 2.37 4.97 2.64 9.98 5.29 11.00 5.88 10.03 5.32 11.06 5.86 22.23 11.70 24.28 12.88 22.39 11.75 24.44 12.83 Complex Array Format Time and frequency series of complex samples are used as inputs and outputs to the Fourier transform routines. The complex array format used in all these routines is alternating real-imaginary i.e. a sequence of N complex numbers c is stored in an array x| x[2n] = Refc(n)g; x[2n + 1] = Imfc(n)g 0n 32768. Description : This routine calculates a complex Fast Fourier Transform. The calculation is not-in-place, so the input and output arrays must not overlap (see FftInComplex for an in-place version of this routine). Notes : The ordering of real and imaginary components in a complex array is described on page 16. Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts output should be located in on-chip memory. Page 20 of 87 FftReal De nition : int FftReal( output, input, size, scale ) Arguments : short short long long output[ ] input[ ] size scale Returns : int EDSP OK EDSP BAD ARG positive frequency complex output data. real input data size of FFT. scaling speci cation. success. size < 8. size not a power of 2. size > 32768. Description : This routine calculates a real Fast Fourier Transform by performing a complex FFT of half the size and then transforming the data. On returning, output contains positive frequencies only. The negative frequencies are simply the complex conjugate of the positive frequencies. Since the frequency values at both 0 and Fs=2 are real, the Fs=2 output is placed in the second element of output i.e. where the imaginary component of zero frequency would have been stored. The calculation is not-in-place, so the input and output arrays must not overlap (see FftInReal for an in-place version of this routine). Notes : The ordering of real and imaginary components in a complex array is described on page 16. Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. Page 21 of 87 scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts output should be located in on-chip memory. Page 22 of 87 ItComplex De nition : int ItComplex( output, input, size, scale ) Arguments : short short long long output[ ] input[ ] size scale Returns : int EDSP OK EDSP BAD ARG complex output data. complex input data. size of inverse FFT. scaling speci cation. success. size < 4. size not a power of 2. size > 32768. Description : This routine calculates an inverse Fast Fourier Transform. The calculation is not-in-place, so the input and output arrays must not overlap (see ItInComplex for an in-place version of this routine). Notes : The ordering of real and imaginary components in a complex array is described on page 16. Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts output should be located in on-chip memory. Page 23 of 87 ItReal De nition : int ItReal( output, input, size, scale ) Arguments : short short long long output[ ] input[ ] size scale Returns : int EDSP OK EDSP BAD ARG real output data. positive frequency complex input data. size of inverse FFT. scaling speci cation. success. size < 8. size not a power of 2. size > 32768. Description : This routine calculates an inverse Fast Fourier Transform by transforming the data and performing a complex FFT of half the size. input should contain the complex frequency data for the positive frequencies only. The routine assumes that the negative frequencies are simply the complex conjugate of the positive frequencies. Since the frequency values at both 0 and Fs=2 can only be real, the Fs=2 value should be placed in the second element of input where the imaginary component of 0 would have been stored. When ItReal returns output contains a series of real data samples. The calculation is not-in-place, so the input and output arrays must not overlap (see ItInReal for an in-place version of this routine). Notes : The ordering of real and imaginary components in a complex array is described on page 16. Page 24 of 87 Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts input and output should both be located in on-chip memory. Page 25 of 87 FftInComplex De nition : int FftInComplex( data, size, scale ) Arguments : short long long data[ ] complex data. size size of FFT. scale scaling speci cation. Returns : int EDSP OK EDSP BAD ARG success. size < 4. size not a power of 2. size > 32768. Description : This routine calculates an in-place complex Fast Fourier Transform. Notes : The ordering of real and imaginary components in a complex array is described on page 16. Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts data should be located in on-chip memory. Page 26 of 87 FftInReal De nition : int FftInReal( data, size, scale ) Arguments : short long long data[ ] real data on input, complex positive frequency data on output. size size of FFT. scale scaling speci cation. Returns : int EDSP OK EDSP BAD ARG success. size < 8. size not a power of 2. size > 32768. Description : This routine calculates an in-place real Fast Fourier Transform by performing a complex FFT of half the size and then transforming the data. On returning, data contains the complex transform data for the positive frequencies only. The negative frequencies are simply the complex conjugate of the positive frequencies. Since the frequency values at both 0 and Fs=2 are real, the Fs=2 output is placed in the second element of data i.e. where the imaginary component of 0 would have been stored. Although this slightly complicates the output format, it enables the FFT to be carried out in place. Notes : The ordering of real and imaginary components in a complex array is described on page 16. Page 27 of 87 Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts data should be located in on-chip memory. Page 28 of 87 ItInComplex De nition : int ItInComplex( data, size, scale ) Arguments : short long long data[ ] complex data. size size of inverse FFT. scale scaling speci cation. Returns : int EDSP OK EDSP BAD ARG success. size < 4. size not a power of 2. size > 32768. Description : This routine calculates an in-place complex inverse Fast Fourier Transform. Notes : The ordering of real and imaginary components in a complex array is described on page 16. Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts data should be located in on-chip memory. Page 29 of 87 ItInReal De nition : int ItInReal( data, size, scale ) Arguments : short long long data[ ] complex positive frequency data on input, real data on output. size size of inverse FFT. scale scaling speci cation. Returns : int EDSP OK EDSP BAD ARG success. size < 8. size not a power of 2. size > 32768. Description : This routine calculates an in-place real inverse Fast Fourier Transform by transforming the data and performing a complex FFT of half the size. data should contain the complex frequency data for the positive frequencies only. The routine assumes that the negative frequencies are simply the complex conjugate of the positive frequencies. Since the frequency values at both 0 and Fs=2 can only be real, the Fs=2 value should be placed in the second element of data where the imaginary component of 0 would have been stored. When ItInReal returns data contains a series of real samples. Notes : The ordering of real and imaginary components in a complex array is described on page 16. Page 30 of 87 Before calling this routine the bit reversal and twiddle tables should be initialised by calling InitFft. scale should be 0xffffffff and the peak signal power should be < 230 unless the properties of the input signal are well understood. Only the bottom log2(size) bits of scale are used; the MSB's are ignored. To minimise bus con icts data should be located in on-chip memory. Page 31 of 87 LogMagnitude De nition : int LogMagnitude( output, input, no elements, fscale ) Arguments : short short long float output[ ] real output y. input[ ] complex input x no elements number of log magnitude outputs N required. fscale output scaling factor Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. jfscalej 215=(10 log10 231). Description : This routine calculates the log magnitude of the complex input data in decibels, and writes the scaled result into the output array| y(n) = 10fscale log10(x(2n)2 + x(2n + 1)2 ) 0n 32768. Description : This routine generates the sine/cosine and bit reversal tables used by the FFT functions. This function need only be called once, before the rst FFT is called. The lookup tables are stored in memory allocated by malloc. Notes : The lookup tables are generated for the speci ed transform size. Smaller transforms will be performed using the same lookup tables. The addresses of the lookup tables are stored in internal variables; they should not be accessed by user code. Page 33 of 87 FreeFft De nition : void FreeFft( void ) Arguments : none Returns : void Description : This routine returns the memory used for FFT lookup tables to the heap via free. No further FFT functions can be called before a subsequent InitFft. Page 34 of 87 8 Windowing Contents| GenBlackman GenHamming GenHanning GenTriangle Generate a Blackman window. Generate a Hamming window. Generate a Hanning window. Generate a Triangle window. Test Functions The test le testwin.c is supplied for the routines in this section. The function Tstwin compares the window functions against precalculated data to ensure accuracy. Note that the windows produced on dierent architectures may dier in the least signi cant bit. Page 35 of 87 GenBlackman De nition : int GenBlackman( output, win size ) Arguments : short long output[ ] array to contain window w(n). win size window size N required. Returns : int EDSP OK EDSP BAD ARG Description : success. win size 1. This routine generates a Blackman window in output. VectorMult can be used to apply the window to a data array. The function used is| 2n 4n 15 w(n) = (2 ,1) 0:42 , 0:5 cos N + 0:08 cos N 0 n < N Page 36 of 87 GenHamming De nition : int GenHamming( output, win size ) Arguments : short long output[ ] array to contain window w(n). win size window size N required. Returns : int EDSP OK EDSP BAD ARG Description : success. win size 1. This routine generates a Hamming window in output. VectorMult can be used to apply the window to a data array. The function used is| 2n 15 w(n) = (2 , 1) 0:54 , 0:46 cos N 0n 27. Description : This routine implements a nite impulse response (FIR) lter. It uses workspace to record the most recent input samples. The result of ltering the data in input is written to output| "KX,1 # y(n) = h(k)x(n , k) :2,res shift k=0 The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. Page 43 of 87 Notes : Filtering is likely to introduce noise due to saturation or quantisa- tion. If the coecients are too large saturation may occur; if they are too low excessive quantisation noise may be introduced. The scaling of coecients must be carefully considered to balance the eects of quantisation and saturation. One scheme that avoids saturation is to scale the coecients so that P abs(coes) < 226 and res shift = 26. For many input signals smaller result shifts could be used with a low likelihood of saturation, but with signi cantly reduced quantisation noise. Before calling this routine for a new lter, initialise the lter workspace by calling InitFir. Execution is fastest when res shift is set to 17 or 18. output may be the same as input in which case input is overwritten. Page 44 of 87 Fir1 De nition : int Fir1( output, input, coe, no coes, res shift, workspace ) Arguments : short short short long int short output sample y(n). input sample x(n). array of lter coecients h. number of coecients K (length of lter). right shift applied to each output. res shift *workspace variable containing pointer to lter memory. *output input coe[ ] no coes Returns : int EDSP OK EDSP BAD ARG success. no coe 2. res shift < 0. res shift > 27. Description : This routine implements a nite impulse response (FIR) lter for one sample only. It uses workspace to record the most recent input samples. The result of ltering the data in input is written to output| "KX,1 # y(n) = h(k)x(n , k) :2,res shift k=0 The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. Notes : Page 45 of 87 Filtering is likely to introduce noise due to saturation or quanti- sation. If the coecients are too large saturation may occur; if they are too low excessive quantisation noise may be introduced. the scaling of coecients must be carefully considered to balance the eects of quantisation and saturation. One scheme that avoids saturation is to scale the coecients so that P abs(coes) < 226 and res shift = 26. For many input signals smaller result shifts could be used with a low likelihood of saturation, but with signi cantly reduced quantisation noise. Before calling this routine for a new lter, initialise the lter workspace by calling InitFir. Execution is fastest when res shift is set to 17 or 18. Page 46 of 87 Iir De nition : int Iir( output, input, no samples, coe, no sections, workspace ) Arguments : short short long short long short output samples yK,1. input samples x0. number of samples N to be ltered. lter coecients. number of second order lter sections K. *workspace variable containing pointer to lter memory. output[ ] input[ ] no samples coe[ ] no sections Returns : int EDSP OK EDSP BAD ARG success. no samples < 1. no sections < 1. a0k < 0. a0k > 16. Description : This routine implements an in nite impulse response (IIR) lter. The lter is implemented as a cascade of K second order lters called biquads, with an additional scaling performed on the biquad output. This implementation uses the Direct Form II representation, where the a coecients are speci ed in fractional (Q15) format and the output of each biquad is given by h i dk (n) = a1kdk (n , 1) + a2k dk (n , 2) + 215x(n) :2,15 yk (n) = [b0k dk (n) + b1k dk (n , 1) + b2kdk (n , 2)] :2,a0k Page 47 of 87 The input xk (n) to the kth section is the output yk,1(n) of the previous section. The input to the rst (k = 0) section is taken from input. The output from the last (k = K , 1) section is written to output. Each biquad is calculated in 32 bits using saturating arithmetic. Following the multiply by 215 or 2,a0k , 16 LSB's are extracted from the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. The lter coecients should be speci ed in coe in the order| a00; a10; a20; b00; b10; b20; a01; a11; a21; b01:::b2K,1 Notes : Before calling this routine for a new lter, initialise the lter by calling InitIir. a0k describes a shift rather than a multiply. The equivalent multiply would be by 2,a0k . If a0k < 0 or a0k > 16 the output of the biquad is unde ned. output may be the same as input in which case input is overwritten. To minimise bus con icts workspace should be located in on-chip memory. Page 48 of 87 Iir1 De nition : int Iir1( output, input, coe, no sections, workspace ) Arguments : short short short long short output sample yK,1(n). input sample x0(n). lter coecients. number of second order lter sections K. *workspace variable containing pointer to lter memory. *output input coe[ ] no sections Returns : int EDSP OK EDSP BAD ARG Description : success. no sections < 1. a0k < 0. a0k > 16. This routine implements an in nite impulse response (IIR) lter for one sample only. The lter is implemented as a cascade of K second order lters called biquads, with an additional scaling performed on the biquad output. This implementation uses the Direct Form II representation, where the a coecients are speci ed in fractional (Q15) format and the output of each biquad is given by h i dk (n) = a1kdk (n , 1) + a2k dk (n , 2) + 215x(n) :2,15 yk (n) = [b0k dk (n) + b1k dk (n , 1) + b2kdk (n , 2)] :2,a0k The input xk (n) to the kth section is the output yk,1(n) of the previous section. The input to the rst (k = 0) section is taken from input. The output from the last (k = K , 1) section is written to output. coe should contain coecients in the order| Page 49 of 87 a00; a10; a20; b00; b10; b20; a01; a11; a21; b01:::b2K,1 Each biquad is calculated in 32 bits using saturating arithmetic. Following the multiply by 215 or 2,a0k , 16 LSB's are extracted from the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. Notes : Before calling this routine for a new lter, initialise the lter by calling InitIir. a0k describes a shift rather than a multiply. The equivalent multiply would be by 2,a0k . If a0k < 0 or a0k > 16 the output of the biquad is unde ned. To minimise bus con icts workspace should be located in on-chip memory. Page 50 of 87 DIir De nition : int DIir( output, input, no samples, coe, no sections, workspace ) Arguments : short short long long long long output samples yK,1. input samples x. number of samples N to be ltered. lter coecients. number of second order lter sections K. *workspace variable containing pointer to lter memory. output[ ] input[ ] no samples coe[ ] no sections Returns : int EDSP OK EDSP BAD ARG success. no samples < 1. no sections < 1. a0k < 0. a0k > 32 for k < K , 1. a0k > 48 for k = K , 1. Description : This routine implements an in nite impulse response (IIR) lter with double precision coecients and delay node storage. The lter is implemented as a cascade of K second order lters called biquads, with an additional scaling performed on the biquad output. This implementation uses the Direct Form II representation, where the a coecients are speci ed in fractional (Q31) format and the output of each biquad is given by h i dk (n) = a1kdk (n , 1) + a2k dk (n , 2) + 231x(n) :2,31 yk (n) = [b0k dk (n) + b1k dk (n , 1) + b2kdk (n , 2)] :2,a0k Page 51 of 87 The input xk (n) to the kth section is the output yk,1(n) of the previous section. The input to the rst (k = 0) section is taken from input after a multiply by 216. The output from the last (k = K , 1) section is written to output. coe should contain coecients in the order| a00; a10; a20; b00; b10; b20; a01; a11; a21; b01:::b2K,1 DIir diers from Iir in that the lter coecients are speci ed as 32 rather than 16 bit values. Consequently 64 bit accumulators is used, with intermediate biquad outputs stored as 32 bit quantities following the shifts by 31 or a0k . This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary.In the nal stage only 16 bits are retained following the shift by a0K,1; again surplus LSB's are rounded and surplus MSB's are saturated. Notes : Before calling this routine for a new lter, initialise the lter by calling InitDIir. a0k describes a shift rather than a multiply. The equivalent multiply would be by 2,a0k . If a0k < 0 or a0k > 32 (or a0K,1 > 48) the output of the biquad is unde ned. The most common use of DIir speci es the coecients in Q31 format. In this case a0k should be set to 31 for k < K , 1 and to 47 for k = K , 1. The least signi cant bit of each 32 bit coecient and delay node value is ignored. When possible Iir should be used in preference to DIir as it runs an order of magnitude faster on the SH-1. output may be the same as input in which case input is overwritten. To minimise bus con icts workspace should be located in on-chip memory. Page 52 of 87 DIir1 De nition : int DIir1( output, input, coe, no sections, workspace ) Arguments : short short long long long output sample yK,1(n). input sample x0(n). lter coecients. number of second order lter sections K. *workspace variable containing pointer to lter memory. *output input coe[ ] no sections Returns : int EDSP OK EDSP BAD ARG success. no sections < 1. a0k < 0. a0k > 32 for k < K , 1. a0k > 48 for k = K , 1. Description : This routine implements a double precision in nite impulse response (IIR) lter for one sample only. The lter is implemented as a cascade of K second order lters called biquads, with an additional scaling performed on the biquad output. This implementation uses the Direct Form II representation, where the a coecients are speci ed in fractional (Q31) format and the output of each biquad is given by h i dk (n) = a1kdk (n , 1) + a2k dk (n , 2) + 231x(n) :2,31 yk (n) = [b0k dk (n) + b1k dk (n , 1) + b2kdk (n , 2)] :2,a0k The input xk (n) to the kth section is the output yk,1(n) of the previous section. The input to the rst (k = 0) section is taken from input after Page 53 of 87 a multiply by 216. The output from the last (k = K , 1) section is written to output. coe should contain coecients in the order| a00; a10; a20; b00; b10; b20; a01; a11; a21; b01:::b2K,1 As with DIir the lter coecients and delay node values are stored as 32 rather than 16 bit values. Consequently 64 bit accumulators are used, with intermediate biquad outputs stored as 32 bit quantities following the shift by 31 or a0k. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary.In the nal stage only 16 bits are retained following the shift by a0K,1; again surplus LSB's are rounded and surplus MSB's are discarded. Notes : Before calling this routine for a new lter, initialise the lter by calling InitDIir. a0k describes a shift rather than a multiply. The equivalent multiply would be by 2,a0k . If a0k < 0 or a0k > 32 (or a0K,1 > 48) the output of the biquad is unde ned. The most common use of DIir speci es the coecients in Q31 format. In this case a0k should be set to 31 for k < K , 1 and to 47 for k = K , 1. The least signi cant bit of each 32 bit coecient and delay node value is ignored. When possible Iir should be used in preference to DIir as it runs an order of magnitude faster on the SH-1. To minimise bus con icts workspace should be located in on-chip memory. Page 54 of 87 Lms De nition : int Lms( output, input, ref, no samples, coe, no coes, res shift, conv fact, workspace ) Arguments : short short short long short long int short short output[ ] input[ ] ref output[ ] no samples coe[ ] no coes res shift conv fact *workspace Returns : int EDSP OK EDSP BAD ARG output samples y. input samples x. desired output value d. number of samples N to be ltered. adaptive lter coecients h. number of coecients K . right shift applied to each output. convergence factor 2. variable containing pointer to lter memory. success. no samples < 1. no coes 2. res shift < 0. res shift > 27. Description : This routine implements a real adaptive FIR lter using the least mean square algorithm. The FIR lter is de ned as| "KX # ,1 y(n) = hn (k)x(n , k) :2,res shift k=0 The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction Page 55 of 87 rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. The Widrow-Ho algorithm is used to update the lter coecients| hn+1 (k) = hn(k) + 2e(n)x(n , k) where the addition is saturating and e(n) is the (saturated) error between the desired signal and the actual lter output| e(n) = d(n) , y(n) The calculation of 2e(n)x(n , k) requires two 1616 multiplies. In both multiplies bits [31:16] of the product are retained, rounded and saturated if necessary. Notes : Any digital signal processing can introduce noise due to saturation or quantisation eects. If the coecients are too large saturation may occur; if they are too low excessive quantisation noise may be introduced. The scaling of coecients must be carefully considered to balance the eects of quantisation and saturation. One scheme that avoids saturation is to scale the coecients so P that abs(coes) < 226 and res shift = 26. For many input signals smaller result shifts could be used with a low likelihood of saturation, but with signi cantly reduced quantisation noise. However, it is not possible to guarantee that the coecients generated by an LMS lter will conform to the scaling convention described above. conv fact should normally be positive; it should never be 0x8000. Before calling this routine for a new lter, initialise the lter workspace by calling InitLms. Execution is fastest when res shift is set to 17 or 18. output may be the same as input or ref output in which case input or ref output is overwritten. Page 56 of 87 If no adaption is required 2 may be set to zero. Alternatively Fir or Fir1 may be used with the same lter coecients and workspace. Page 57 of 87 Lms1 De nition : int Lms1( output, input, ref output, coe, no coes, res shift, conv fact, workspace ) Arguments : short short short short long int short short *output input ref coe[ ] no coes res shift conv fact *workspace Returns : int EDSP OK EDSP BAD ARG output sample y(n). input sample x(n). desired output value d(n). adaptive lter coecients h. number of coecients K . right shift applied to each output. convergence factor 2. variable containing pointer to lter memory. success. no coes 2. res shift < 0. res shift > 27. Description : This routine implements a real adaptive FIR lter using the least mean square algorithm, for a single input sample. The FIR lter is de ned as| "KX # ,1 y(n) = hn (k)x(n , k) :2,res shift k=0 The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction Page 58 of 87 rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. The Widrow-Ho algorithm is used to update the lter coecients| hn+1 (k) = hn(k) + 2e(n)x(n , k) where the addition is saturating and e(n) is the (saturated) error between the desired signal and the actual lter output| e(n) = d(n) , y(n) The calculation of 2e(n)x(n , k) requires two 1616 multiplies. In both multiplies bits [31:16] of the product are retained, rounded and saturated if necessary. Notes : Any digital signal processing can introduce noise due to saturation or quantisation eects. If the coecients are too large saturation may occur; if they are too low excessive quantisation noise may be introduced. The scaling of coecients must be carefully considered to balance the eects of quantisation and saturation. One scheme that avoids saturation is to scale the coecients so P that abs(coes) < 226 and res shift = 26. For many input signals smaller result shifts could be used with a low likelihood of saturation, but with signi cantly reduced quantisation noise. However, it is not possible to guarantee that the coecients generated by an LMS lter will conform to the scaling convention described above. conv fact should normally be positive; it should never be 0x8000. Before calling this routine for a new lter, initialise the lter workspace by calling InitLms. Execution is fastest when res shift is set to 17 or 18. output may be the same as input or ref output in which case the input or ref output is overwritten. Page 59 of 87 If no adaption is required 2 may be set to zero. Alternatively Fir or Fir1 may be used with the same lter coecients and workspace. Page 60 of 87 InitFir De nition : int InitFir( workspace, no coes ) Arguments : short long **workspace address of variable containing pointer to buer reserved for lter memory. no coes number of coecients K . Returns : int EDSP OK EDSP NO HEAP EDSP BAD ARG success. insucient space available from malloc. no coes 2. Description : This routine allocates, via malloc, the memory required for subsequent calls to Fir and Fir1. The history of previous input samples is initialised to zero. Notes : The workspace buer allocated by InitFir should only be manipulated by Fir, Fir1, Lms and Lms1. Page 61 of 87 InitIir De nition : int InitIir( workspace, no sections ) Arguments : short long **workspace address of variable containing pointer to buer reserved for lter memory. no sections number of second order lter sections K. Returns : int EDSP OK EDSP NO HEAP EDSP BAD ARG success. insucient space available from malloc. no sections < 1. Description : This routine allocates, via malloc, the memory required for subsequent calls to Iir and Iir1. All delay node values are initialised to zero. Notes : The workspace buer allocated by InitIir should only be manipulated by Iir and Iir1. Page 62 of 87 InitDIir De nition : int InitDIir( workspace, no sections ) Arguments : long long **workspace address of variable containing pointer to buer reserved for lter memory. no sections number of second order lter sections K. Returns : int EDSP OK EDSP NO HEAP EDSP BAD ARG success. insucient space available from malloc. no sections < 1. Description : This routine allocates, via malloc, the memory required for subsequent calls to DIir and DIir1. All delay node values are initialised to zero. Notes : The workspace buer allocated by InitDIir should only be manipulated by DIir and DIir1. Page 63 of 87 InitLms De nition : int InitLms( workspace, no coes ) Arguments : short long **workspace address of variable containing pointer to buer reserved for lter memory. no coes number of coecients K . Returns : int EDSP OK EDSP NO HEAP EDSP BAD ARG success. insucient space available from malloc. no coes 2. Description : This routine allocates, via malloc, the memory required for subsequent calls to Lms and Lms1.The history of previous input samples is initialised to zero. Notes : The workspace buer allocated by InitLms should only be manipulated by Fir, Fir1, Lms and Lms1. Page 64 of 87 10 Convolution and Correlation Contents| ConvComplete ConvCyclic ConvPartial Correlate CorrCyclic Calculate the complete convolution of two arrays. Calculate the cyclic convolution of two arrays. Calculate the partial convolution of two arrays. Calculate the correlation between two arrays. Calculate the cyclic correlation between two arrays. Routine Timings The execution time in microseconds of the lter routines on a 10 MIPS SH-1 are given below for ConvPartial, ConvCyclic, CorrCyclic and Correlate (with no correlations past the end of the longer input). The times are given per output sample, computed by dividing the time to produce K output samples by K . Note that iw is always the smaller input array. Size of Routine iw K ConvPartial ConvCyclic CorrCyclic Correlate 5 5 9.7 9.7 9.0 8.1 10 10 10.6 11.2 10.2 9.0 100 100 54.6 55.4 47.7 46.4 Where correlations or convolutions past the ends of arrays take place, as happens in ConvComplete and sometimes in Correlate, the output samples are produced with less computation. The most extreme case of this is when the two input arrays are the same size n, and the computations continue until the arrays overlap by only one sample. In this case ConvComplete produces 2n , 1 samples, and Correlate produces n samples, with all but one of those samples formed from computation past array ends. Page 65 of 87 For this extreme case, the timings in microseconds per output sample, again computed by dividing the time to produce K output samples by K , are| Size of Routine iw K ConvComplete Correlate 5 9/5 6.5 8.4 10 19/10 7.1 8.2 100 199/100 29.0 26.1 Test Functions The Convolution/Correlation functions are tested for bit exactness by the le testconv.c which calls the functions Tconvcmp, Tconvcyc, Tconvpar, Tcorr and Tcorrcyc. Page 66 of 87 ConvComplete De nition : int ConvComplete( output, iw, ix, iw size, ix size, res shift ) Arguments : short short short long long int output y. input w. input x. size of iw W . size of ix X . right shift applied to each output. output[ ] iw[ ] ix[ ] iw size ix size res shift Returns : int EDSP OK EDSP BAD ARG success. iw size < 1. ix size < 1. res shift < 0. res shift > 27. Description : This routine completely convolves the two input arrays and puts the result in the output array| "WX,1 # y(m) = w(i)x(m , i) :2,res shift 0m 27. Description : This routine cyclically convolves the two input arrays and puts the result in the output array| "NX # ,1 y(m) = w(i)x(jm , i + N jN ) :2,res shift 0m 27. Description : This routine convolves the two input arrays but does not include outputs derived from elements outside the array boundaries| y(m) = "WX,1 i=0 # w(i)x(m + W , 1 , i) :2,res shift 0mX ,W The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. Notes : Page 70 of 87 The output array size must be at least X , W + 1. x must be the larger array, i.e. X W . To minimise bus con icts iw should be located in on-chip memory. Page 71 of 87 Correlate De nition : int Correlate( output, iw, ix, iw size, ix size, no corr, res shift ) Arguments : short short short long long long int output[ ] iw[ ] ix[ ] iw size ix size no corr res shift Returns : int EDSP OK EDSP BAD ARG output y. smaller input w. larger input x. size of iw W . size of ix X . number of correlations M to compute. right shift applied to each output. success. iw size < 1. ix size < 1. no corr < 1. ix size < iw size. res shift < 0. res shift > 27. Description : This routine correlates the two input arrays and puts the result in the output array| "WX,1 # y(m) = w(i)x(i + m) :2,res shift 0m 27. Description : This routine cyclically correlates the two input arrays and puts the result in the output array| "NX # ,1 y(m) = w(i)x(ji + mjN ) :2,res shift 0m 27. This routine multiplies the two matrices ip matrix1 and ip matrix2 and stores the result in op matrix. ip matrix1 is m n, ip matrix2 is n p and op matrix is m p. The sum is accumulated in 42 bits. Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the accumulator. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. Each matrix is stored in the normal C' manner (row major order)| 0 1 a0 a1 a2 a3 B@ a4 a5 a6 a7 CA a8 a9 a10 a11 Page 78 of 87 Notes : To minimise bus con icts ip matrix1 should be located in on-chip memory. The function prototype speci es the array parameters as void * to allow arbitrary array sizes to be speci ed. These parameters should point to short data. Page 79 of 87 VectorMult De nition : int VectorMult( output, ip1, ip2, no elements, res shift ) Arguments : short short short long int output[ ] ip1[ ] ip2[ ] no elements res shift Returns : int EDSP OK EDSP BAD ARG output. input. input. number of elements to evaluate. right shift applied to each output. success. no elements < 1. res shift < 0. res shift > 16. Description : This routine multiplies pairs of elements from ip1 and ip2 and stores the results in output. Notes : Each 16 bit output is extracted from bit positions [res shift+15:res shift] of the product. This extraction rounds the surplus LSB's and checks the surplus MSB's for over ow, saturating if necessary. This routine performs elementwise multiplications. To calculate a dot product use MatrixMult with n set to 1. Page 80 of 87 MsPower De nition : int MsPower( output, input, no elements ) Arguments : long short long *output result. input[ ] input. no elements number of elements N to evaluate. Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. Description : This routine calculates the Mean Square Power of the input data| NX ,1 Mean Square Power = N1 x(i)2 i=0 Notes : The division result is rounded to the nearest integral value. The sum is accumulated in 64 bits. If no elements is more than 232-1 over ow may occur. Page 81 of 87 Mean De nition : int Mean( mean, input, no elements ) Arguments : short short long *mean mean of data in input x. input[ ] input x. no elements number of elements N to process. Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. Description : This routine calculates the mean of input| N X,1 x = N1 x(i) i=0 Notes : The division result is rounded to the nearest integral value. The sum is accumulated in 32 bits. If no elements is more than 216-1 over ow may occur. Page 82 of 87 Variance De nition : int Variance( variance, mean, input, no elements ) Arguments : long short short long *variance *mean input[ ] no elements Returns : int EDSP OK EDSP BAD ARG The variance of input 2. mean of data x. input x. number of elements N to process. success. no elements < 1. Description : This routine rst calculates the mean of input| N X,1 1 x = N x(i) i=0 It then computes the variance| NX ,1 2 = N1 (x(i) , x)2 i=0 Notes : The division results are rounded to the nearest integral values. x is accumulated in 32 bits and is not checked for over ow. If no elements is more than 216-1 over ow may occur. 2 is accumulated in 64 bits and is not checked for over ow. Page 83 of 87 MaxI De nition : int MaxI( max ptr, input, no elements ) Arguments : short short long **max ptr address of pointer to the maximum element. input[ ] input. no elements number of elements to process. Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. Description : This routine searches input to nd the element with the maximum value, and returns its address in max ptr. Notes : If several elements have the same maximum value the one nearest the start of input is returned Page 84 of 87 MinI De nition : int MinI( min ptr, input, no elements ) Arguments : short short long **min ptr address of pointer to the minimum element. input[ ] input. no elements number of elements to process. Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. Description : This routine searches input to nd the element with the minimum value, and returns its address in min ptr. Notes : If several elements have the same minimum value the one nearest the start of input is returned Page 85 of 87 PeakI De nition : int PeakI( peak ptr, input, no elements ) Arguments : short short long **peak ptr address of pointer to the peak element. input[ ] input. no elements number of elements to process. Returns : int EDSP OK EDSP BAD ARG success. no elements < 1. Description : This routine searches input to nd the element with the maximum absolute value, and returns its address in peak ptr. Notes : If several elements have the same peak value the one nearest the start of input is returned Page 86 of 87 A Contents of Distribution Disk The distribution disk has six main directories, include, lib, ansisrc, optsrc, demo and test with the contents listed below. The lib directory contains the le| ensigma.lib The include directory contains the le| ensigdsp.h The optsrc directory contains the les| black.c convcomp.asm convcycl.asm convpart.asm corrcycl.asm correlat.asm diir.asm diir1.asm fftc.asm fftinc.asm fftinr.asm fftr.asm fir.asm fir1.asm gengwn.c hamming.c hanning.c iir.asm iir1.asm initdiir.c initfft.c initfir.c initiir.c initlms.c lbr.sub lms.asm lms1.asm logmag.c make.bat makefile matrix.asm max.asm mean.asm min.asm mspower.asm mult.asm peak.asm shift.asm triangle.c variance.asm vectmult.asm The ansisrc directory contains the les| black.c convcomp.c convcycl.c convpart.c corrcycl.c correlat.c diir.c diir1.c fftcom.c fftcore.c fftincom.c fftireal.c fftreal.c fir.c fir1.c gengwn.c hamming.c hanning.c ifftcom.c iffticom.c ifftirel.c ifftreal.c iir.c iir1.c initdiir.c initfft.c initfir.c initiir.c initlms.c lms.c lms1.c logmag.c makefile matrix.c max.c mean.c min.c mspower.c peak.c triangle.c variance.c vectmult.c The demo directory contains the les| c0.src demo.c demoio.c demoio.h link.cmd make.bat makefile The test directory contains the les| c0.src link.cmd make.bat makefile tconvcmp.c tconvcyc.c tconvpar.c tcorr.c tcorrcyc.c testconv.c testdat testfft.c testfilt.c testhead.h testmisc.c testwin.c textr.c tfft.c tgwn.c tifft.c tmatrixm.c tmean.c tmspower.c tstdiir.c tstdiir1.c tstfir.c tstfir1.c tstiir.c tstiir1.c tstlms.c tstlms1.c tstwin.c tvar.c tvectm.c It also contains the testdat directory, which contains the les| filtdat.txt miscdat.txt tconvcmp.txt tconvcyc.txt tconvpar.txt tcorr.txt tcorrcyc.txt textr.txt tmatrixm.txt tmean.txt tmspower.txt tstdiir.txt tstfir.txt tstiir.txt tstlms.txt tstwin.txt tvar.txt tvectm.txt Page 87 of 87 References [1] W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling. Numerical Recipes in C. Cambridge University Press 1988.