Correlator IP Core April 2005 IP Data Sheet Features Introduction Supports 1- to 8-Bit Input Data Width Supports 1 to 256 Channels Supports a Correlation Window from 8 to 2048 Taps Supports Oversampled Input Data from 2x to 8x Supports Real Correlations for Either Signed or Unsigned Data Supports Complex Correlations for Signed Data Allows the User to Tune the Performance of the Design by Specifying the Values of Several Parameters Provides a Selectable Input FIFO for Maximum Data Throughput Allows the User to Specify the Number of Coefficient Sequences Desired, from 1 to 256 The function of this core is to correlate an incoming data stream to a stored binary pattern called a code sequence or coefficient sequence. The data stream may be binary or multi-valued, either signed or unsigned, and is provided to the core one sample at a time. The core can be configured to perform either a real correlation with a single data input stream and a single coefficient sequence, or a complex correlation with two input data streams representing the real and imaginary input terms, and two coefficient sequences representing the real and imaginary coefficients. The core provides multiple channel capability and can support up to 256 channels. Correlations for each channel operate independently from each other. Also, up to 256 different coefficient sequences may be stored in the core, and each channel can select which coefficient sequence is correlated to that channel, so one coefficient sequence could be used for all 256 channels if desired. Figure 1. Correlator IP Core External Interface Diagram clk Correlator rst irdy din[n:0] crdy din_im[n:0] ordy chan_in[n:0] dout[n:0] code_sel_in[n:0] dout_im[n:0] block_start_in chan_out[n:0] decim_r[n:0] block_start_out numtaps_r[n:0] seq_err coeffaddr[n:0] coeffwdat_im[n:0] coeffwdat[n:0] coeffwr (c) 2005 Lattice Semiconductor Corp. All Lattice trademarks, registered trademarks, patents, and disclaimers are as listed at www.latticesemi.com/legal. All other brand or product names are trademarks or registered trademarks of their respective holders. The specifications and information herein are subject to change without notice. The product described herein is subject to continuing development, and applicable specifications and information are subject to change without notice. Such specifications and information are provided in good faith; actual performance is not guaranteed, as it is dependent on many factors, including the user's system design. www.latticesemi.com 1 ip1039_01.0 Lattice Semiconductor Correlator IP Core General Description A correlation function determines how closely a data sequence matches a reference, or "coefficient" sequence. A high correlation value means that the data sequence closely matches the coefficient sequence. A low correlation value means that the data sequence is dissimilar to the coefficient sequence. The basic correlator equation is given by: corr _ win 1 rk = di + k c i k = 0,1,...,num_lags-1 (1) i =0 The terms of the equation are: * di - Input data sequence. The Correlator IP core allows the input sequence to be from 1 to 8 bits wide, and either signed (two's complement) or unsigned data. * ci - Coefficient (or code) sequence. In the Correlator IP core, this sequence must be loaded into internal memory prior to a correlation operation. This sequence is always binary {1,0}; however, the coefficient sequence may represent either an unsigned sequence {1,0} or a signed sequence {+1,-1}. In the case of a signed coefficient sequence, a 1 in the sequence represents a value of +1 and a 0 in the sequence represents a value of -1. The di and ci inputs must both be of the same type, either signed or unsigned. * rk - Correlation result output sequence (correlation between di and ci inputs). * corr_win - Correlation window. This is the number of elements in the input data sequence over which the correlation function is calculated. This is also referred to as the number of "taps." For the Correlator IP core this number is determined by the user when configuring the core. Once selected, the number of data elements (and coefficient terms) is then fixed at this number for all correlation operations. The number of terms in the coefficient sequence is always equal to the number of taps (corr_win) specified. * num_lags - Total number of lags for which the correlation is performed. This is also the length of the correlation result sequence, rk. From Equation 1, a correlation operation takes an input data sequence di of length "corr_win" and multiplies each term in the sequence against the terms of the reference coefficient sequence ci, summing the results of all of the multiplications to produce the result rk. The input data sequence is then shifted by one element and the operation is repeated to produce the next term in the rk sequence. This is done "num_lags" times. In the Correlator IP core, each time a new data term is input to the core, one correlation operation is performed across "corr_win" data and coefficient terms, producing one result, r. The "k" index in Equation 1 does not apply since the Correlator always produces one new result when it receives one new data value. Old data beyond the defined correlation window size is not held in memory. In addition to the basic correlation function described above, the Correlator IP core can be configured to perform complex correlations, defined by the equation: corr _ win 1 rk = di + k c i k = 0,1,...,num_lags-1 (2) i =0 In this case, the data and coefficient input sequences are both complex and each contains a real input sequence and an imaginary input sequence. For the Correlator IP core, the input data sequence is a sequence of signed (two's complement) numbers from 1 to 8 bits wide, and the coefficient sequence is a binary sequence where a coefficient value of 1 represents +1 and a coefficient value of 0 represents a -1. Equation 2 represents the complex conjugate of ci. The complex conjugate multiplication expressed in Equation 2 is given as: dc = (dre cre + dim cim) + j(dim cre - dre cim) 2 (3) Lattice Semiconductor Correlator IP Core Since the coefficients in a complex correlation are restricted to the values {+1,-1}, the multiplications in Equation 3 simplify to inversions of the dre and dim terms, and the whole equation reduces to a series of additions and subtractions. The Correlator IP core performs these inversions and sums the results to produce a new result value r for each new dre and dim term input to the core. In this case, the result sequence will have two terms, a real term and an imaginary term. Functional Description The Correlator IP core is composed of the following functional blocks: * State Machine - Controls the flow of data received from the user. Generates the starting pointer values necessary to read/write the Tap Memory and stores the pointer values in the Channel Memory. Stores the starting pointer values for the next correlation operation in the input FIFO. Generates the "shiftby" value for the aligner. * Channel Memory - Stores the pointer to the location in Tap Memory to write the next data value. * Input FIFO - Stores the pointer to the starting point in Tap and Coefficient Memories for the next correlation. * Tap Memory - Stores the data terms (di). * Coefficient Memory - Stores the coefficient terms (ci). * Aligner - Aligns data and coefficients read from memory for the correlation operation. * Correlator - Performs the equivalent of the multiplication operations in Equations 1 and 2. * Adder/Accumulator - Performs the addition and subtraction operations in Equations 1 and 2. Correlator Input and Output Data The Correlator IP core accepts a new input data value for a channel and writes that value into Tap Memory. When it is ready to perform the next correlation operation for that channel, the new data value will be included in the correlation, along with enough "old" data already in memory to completely fill the correlation window. The Tap Memory is a circular buffer which contains a correlation window's worth of data. As each new value is added to Tap Memory for a particular channel, a correlation operation needs to be run and completed before the next new value is added to memory for that channel. New data can be written into Tap Memory for channels other than the one the Correlator is operating on, however it is the user's responsibility to insure that a correlation operation is done for a particular channel before new data is written into memory for that same channel. This is easy to do for a large number of channels where new data is written to channels in a round-robin sequence, or if the Correlator throughput is not stressed to its limit (i.e. unused cycles appear between correlations), but the problem can be difficult to manage for small numbers of channels. The Correlator IP core will automatically prevent new data being written into Tap Memory and corrupting a correlation as long as the input FIFO depth (parameter 9) is set to 1. Figures 2 and 3 show the timing of the user interface. The state machine accepts one new data value from the user interface at a time. When the state machine is ready to accept a new input data word from the user interface it asserts the crdy signal. The user interface then inputs din, chan_in, code_sel_in, block_start_in, and asserts the irdy signal. When the state machine sees irdy go active, it will take the new data value from the user interface. If the design has been configured for multiple channels, the state machine reads the pointer for that channel from the Channel Memory. This pointer value tells the state machine where in Tap Memory to write the data value just received from the user interface. This pointer value will also be the starting point for the next correlation operation, so this pointer value is stored in the Input FIFO until the next correlation operation is ready to start. Once the present correlation operation finishes, or if no correlation operation was in progress when a new data value was received, then the Input FIFO is read to determine the starting pointer for the next correlation. The state machine begins reading the Tap and Coefficient Memories at the starting pointer location, and it reads until it has read an entire correlation window's worth of data and coefficients. Figure 2 shows at time 173.5s that crdy went active. The user provided a value of 0x3 for channel 0, and set the code_sel_in to 1 which indicates which coefficient sequence is to be used for the correlation of channel 0 data. 3 Lattice Semiconductor Correlator IP Core In this example, two channels and two different coefficient sequences have been configured. Each channel can be correlated to either of the two coefficient sequences. Figure 2. User Interface Timing Diagram for Two-channel Correlator Figure 3. User Interface Timing Diagram for Two-channel Correlator The user drives the irdy signal high for one clock cycle indicating that the input values are valid. Along with the input data, the user also sets the block_start_in signal. This signal will be taken as a marker by the Correlator and aligned with the input data as it passes through the Correlator IP core. The next time the block_start_out signal is set and chan_out = 0, it will indicate that the output data was associated with this input data value. The block_start signals act as markers for the user to do frame alignments between the input and outputs of the Correlator IP core. This is necessary since the core operates on one input data sample at a time. It does not perform multiple correlations over "num_lag" values as expressed in Equation 1. This allows the simplest and most versatile Correlator IP core design. If it is necessary to operate for multiple "lags," then the user application will need to add input and output FIFOs around the core to feed data values one sample at time. At time 178.5s, crdy again goes active indicating that the Correlator IP core is ready to accept the next input value, and in the example of Figure 2 the user inputs data for channel 1. At time 196.5s, the correlation result for channel 0 is ready at the dout outputs, and the core outputs a value of 0x9 on dout, sets the chan_out to 0, and asserts the ordy signal. It also asserts the block_start_out signal to indicate that this output value was associated with the din value from time 175.5s. 4 Lattice Semiconductor Correlator IP Core Figure 4. Tap and Coefficient Memories EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block EBR Block Coefficient Memory EBR Block Tap Memory EBR Block Number of Data Words (Taps) per Row of Memory = MWIDTH EBR Block Number of Coefficients per Row of Memory = MWIDTH Aligner Correlator Corr Cell Corr Cell Corr Cell Corr Cell Number of Correlator Cells = MWIDTH Tap and Coefficient Memories While the Tap and Coefficient Memories are being read, the values read are passed to the Aligner. Under the control of the state machine the Aligner shifts the tap data and coefficients to be passed to the Correlator block. The state machine also generates strobe signals to the Aligner which indicate, in any given clock cycle, which tap and coefficient values are valid for the correlator block to work on. The Tap and Coefficient Memories are implemented with EBR blocks as shown in Figure 4. The Correlator IP core will automatically configure and instantiate the proper number of EBR blocks in the design based on the parameters selected by the user. In the case of the Tap Memory, the number of correlator cells, number of taps, number of channels, and the oversampling rate all determine how many EBR memories are needed. The number of correlator cells (parameter MWIDTH) determines how many words of data can be operated on during a single clock cycle. The more correlator cells which are configured, the more multiplication operations can occur in a clock cycle and the overall data throughput goes up. At least one EBR memory is required to feed each correlator cell. All Tap Memory EBR blocks in the design will be configured to be at least the word width of the input data (DWIDTH) wide. The EBR blocks can be sized 1, 2, 4, or 9 bits wide, and must be equal to or greater than DWIDTH. Since each EBR block can store 8192 bits, if the value of [TAP_EBR_WIDTH * (NUM_TAP / MWIDTH) * NUM_CHAN * OS_FACTOR] exceeds 8192 bits, then multiple EBR blocks will be stacked in columns to feed the correlator cells, as shown in Figure 4. TAP_EBR_WIDTH is the minimum allowed EBR width which is at least DWIDTH wide. The Tap Memory EBRs will be configured automatically for the user; however, the user is responsible for determining the total number of EBR blocks needed for the design and insure that the target LatticeECTM device contains enough memories. The Coefficient Memories are also implemented in EBR blocks. Since each coefficient is constrained to be 1 bit, the total amount of memory required for coefficients is generally less than that required for tap data. Each EBR 5 Lattice Semiconductor Correlator IP Core block can be a maximum of 36 bits wide, therefore if the number of correlator cells (MWIDTH) is not greater than 36, only one column of EBR memories is required for the Coefficient Memories. If MWIDTH > 36, then multiple columns will be configured. As in the case of the Tap Memories, if the total number of coefficients which needs to be stored exceeds one row of EBR memories, then multiple rows will be configured in a stacked arrangement as shown in Figure 4. For MWIDTH 36, the number of coefficients required is [NUM_TAP * NUM_COEF_SEQ]. If this number is less than 8192 then only one EBR is needed for the Coefficient Memory. Unlike the Tap Memories which are written with new user data under the control of the state machine, the Coefficient Memories must be written with the coefficient sequences before any correlation operations can be done. This is done via the Coefficient Memory Configuration interface shown in Figure 5. This interface consists of the input signals: coeffaddr, coeffwdat, coeffwdat_im, and coeffwr. Figure 6 shows the timing for this interface for a two-channel design with MWIDTH=4, NUM_TAP=16 and NUM_COEF_SEQ=2. Figure 5. Coefficient Memory Configuration Interface Timing In this case, the Coefficient Memory is implemented in one EBR block. Each row of Coefficient Memory is required to store MWIDTH=4 coefficients, so each write to the memory writes four bits. Each coefficient sequence is NUM_TAP=16 bits long, and it will occupy (NUM_TAP / MWIDTH)= 4 rows in the Coefficient Memory. In addition, in this example there are two separate coefficient sequences, so the coefficients will occupy a total of eight rows in the Coefficient Memory. Writes to the Coefficient Memory are enabled by asserting the coeffwr input. The coeffaddr input selects the row of memory to be written, and coeffwdat (and coeffwdat_im for complex correlations) is set to the desired value. This is a very simple interface, however it is essential to make sure that the coefficient sequence is written in the correct order. In the example above, the first four values written are for coefficient sequence 0. The values written are 0xa6fc (or in binary: 1010 0110 1111 1100) with the LSB being the first bit in the correlation sequence. This bit will be multiplied against the newest data value received by the Correlator. The MSB in this string will be multiplied against the oldest data read from Tap Memory. This is explained further in the Correlator Evaluation Package section of this document. The second coefficient sequence written into the Coefficient Memory is 0x0180, and is written into rows 7, 6, 5 and 4. This will be selected as coefficient sequence 1 by setting the code_sel_in to 1 when a data value is input to the Correlator. Figure 6 shows how the coefficient values from this example would appear in the Coefficient Memory: 6 Lattice Semiconductor Correlator IP Core Figure 6. Example Coefficient Sequences Written in Coefficient Memory Upper Locations of EBR (Unused in this Example) Coefficient Sequence 1 Coefficient Sequence 0 Row 7 0 0 0 0 Row 6 0 0 0 1 Row 5 1 0 0 0 Row 4 0 0 0 0 Row 3 1 0 1 0 Row 2 0 1 1 0 Row 1 1 1 1 1 Row 0 1 1 0 0 MWIDTH wide In configurations where the number of taps is not a power of 2, the coefficient sequences will need to be padded with zeros so that all coefficient sequences written to the Coefficient Memory are a power of 2 long. This is because the Coefficient and Tap Memories must be divided up evenly into sections representing the individual channels and coefficient sequences. In the example above, if the number of taps were less than 16, the coefficients would still be written into memory the same way except that padding zeros would be added before the MSB. The padding zeros would be written into memory starting at the MSB of row 3 for coefficient sequence 0. Correlator and Adder/Accumulator Blocks The Correlator block performs the multiplication operations in Equations 1 and 2. The coefficients are configured by the user to be either unsigned or signed. If unsigned, then the binary coefficient values simply represent {1,0} and the multiplications reduce to either passing the tap values read from memory to the Adder/Accumulator, or passing a zero value. If the coefficients are signed, then the binary coefficients {1,0} represent values of {+1,-1}. If a tap value is multiplied by 1, then the Correlator block does nothing other than pass the tap value read from memory to the Adder/Accumulator. If a tap value is multiplied by -1, then the Correlator block does a two's complement conversion of the tap value read from memory and passes the result to the Adder/Accumulator, which in turn completes the summation of the correlation sequence to generate the final result. Decimation The Correlator IP core allows the input data to be oversampled from two to eight times the normal sampling rate. The OS_FACTOR parameter should be set to the correct oversampling rate. When this is done, the core will automatically decimate the amount of data which is included in the correlation operations by the correct amount. For example, if the number of taps is eight and an oversampling rate of two is chosen, then the circuit will correlate the eight coefficient values with the newest input tap data value and the odd numbered tap data values from the past 15 "old" data values. The correlation will look like this: r = d1c1 + d3c2 + d5c3 + d7c4 + d9c5 + d11c6 + d13c7 + d15c8 (4) The number of data values stored in Tap Memory for a given channel becomes [OS_FACTOR*NUM_TAP], or in this case 16. The number of coefficients per channel is still equal to NUM_TAP. 7 Lattice Semiconductor Correlator IP Core Parameter Descriptions The parameters used for configuring the Correlator IP core are listed below. The values of these parameters must be set prior to synthesis or functional verification. Table 1. User Configurable Parameters Parameter Number Parameter Parameter Description Input Range Default Input Value Parameter Values 1 DWIDTH Input data width 1-8 4 -- 2 NUM_TAP Number of taps 8-2048 16 -- 3 MWIDTH Number of correlator cells Minimum = 1 Maximum = the number of EBR blocks in the target LatticeEC device 4 -- 4 NUM_CHAN Number of channels 1-256 2 -- 5 DTYPE Input data type Signed, unsigned Unsigned "UNSIGNED" "SIGNED" 6 COMPLEX Correlation type Real, complex Real Real = 0 Complex = 1 7 OS_FACTOR Oversampling rate 1-8 1 -- 8 PERFORMANCE Performance 1, 2, 3 1 -- 9 FIFO_DEPTH Input FIFO depth 1, 2, 3 1 -- NUM_COEF_SEQ Number of coefficient sequences 1-256 NUM_CHAN -- 10 The basic configuration parameters should be selected based on the type of correlation desired. These include parameters 1, 2, 4, 5, 6, 7 and 10. The remaining parameters 3, 8 and 9 are selected based on the desired performance of the circuit. For parameter 3, a higher fMAX can be achieved by generating a much smaller circuit (smaller number of correlator cells). However, for long data sequences (number of taps, or "corr_win"), this will mean that many clock cycles are needed for each correlation result to be calculated resulting in very poor overall data throughput and long latency times. For higher data throughput, and at the expense of a larger and therefore more complicated circuit, a higher number of correlators should be chosen. The Correlator IP core is architected to be highly pipelined, so even for large numbers of correlator cells, the penalty in fMAX is small; however, as the design becomes more complicated, it will eventually reach a point where the fMAX is impacted by routing in the FPGA. Parameter 8 should be set to 1 for the highest performance circuit. A value of 2 or 3 will result in a smaller, but significantly lower performance design. Parameter 9 sets the depth of the input FIFO. This improves throughput performance by allowing the next input data sample to be presented to the device while the present correlation result is being calculated. However, care must be used when changing this parameter. If the FIFO depth is set above 1, then the user must insure that a new data sample will not be presented to the Correlator IP core for the same channel as is presently being serviced or the new data sample will be written into the core's internal tap memory and will corrupt the correlation which is already in progress for that channel. If the core has been configured for multiple channels, and input data values for the same channel are never presented to the core adjacent to each other in time, then the FIFO depth can be safely increased beyond 1. For example, if the core is configured for eight channels, and data for each of the eight channels is always presented in sequence, then the FIFO depth may be increased to 2 or 3. However, if the core is configured for one or two channels, or the input data sequences through channels at random, then the FIFO depth should never be increased beyond 1. 8 Lattice Semiconductor Correlator IP Core Custom Core Configurations For core configurations that are not available in the Evaluation Packages, please contact your Lattice sales representative to request a custom configuration. Related Information For more information regarding core usage and design verification, refer to the Parallel RapidIO Physical Layer Interface IP Core User's Guide, available on the Lattice web site at www.latticesemi.com. 9 Lattice Semiconductor Correlator IP Core Appendix for LatticeECPTM and LatticeECTM Devices Table 2. Performance and Resource Utilization1 Parameter Filename Corr_8bit_e2_1_001.lpc Parameter Settings SLICEs LUTs Registers External Pins sysMEMTM EBRs fMAX See Table 3 212 140 310 41 5 212.54 MHz 1. Performance and utilization characteristics are generated using LFEC20E-5F672C in Lattice's ispLEVER (R) 4.2 software. When using this IP core in a different density, speed, or grade within the Lattice ECP/EC family, performance may vary. Supplied Netlist Configurations The Ordering Part Number (OPN) for the Correlator IP Core on LatticeECP/EC devices is CORR-8BIT-E2-N1 (for all configurations of the netlist package). Table 3 lists the evaluation netlists that can be downloaded from the Lattice web site at www.latticesemi.com. To load the preset parameters for this core, click on the "Load Parameters" button inside the IP Manager tool. Make sure that you are looking for a file inside this core's directory location. The Lattice Parameter Configuration files (.lpc) are located inside this directory. Table 3. Parameter Settings of the Evaluation Packages Input Data # of # of # of Input Correlation Over-sampling Width Taps Correlators Channels Data Type Type Rate Performance 4 16 4 2 Unsigned Real 10 1 1 Input FIFO Depth # of Coefficient Sequences 1 2 Mouser Electronics Authorized Distributor Click to View Pricing, Inventory, Delivery & Lifecycle Information: Lattice: CORR-8BIT-E2-UT2 CORR-8BIT-PM-UT2 CORR-8BIT-XM-UT2