NetFPGA SUME Reference Manual
Overview
Powered by Xilinx Virtex-7 XC7V690T FPGA, NetFPGA-SUME board is an ideal platform for high-performance and high-density networking design.
32 RocketIO GTH serial transceivers have been used to provide access to 8 lanes of end-point PCI-E (Gen3 x8), 4 SFP+ (10Gbps) ports, 2 SATA-III ports (6Gbps) and 18 data-rate-adjustable GTH ports through a HPC-FMC connector
and a QTH connector.
Wide high-speed memory interfaces in form of 3 x36bit QDRII SRAM interface and 2 x64bit DDR3 SODIMM provide an ideal memory solution for common networking applications.
The board also comes with a modest code base of hardware and software for getting started. The design flow is based on Xilinx Vivado Design Suite and released on github after registration.
Block Diagram
Feature List
FPGA
Xilinx Virtex-7 XC7V690T FFG1761-3
•Memory
Three x36 72Mbits QDR II SRAM (CY7C25652KV18-500BZXC)
Two 4GB DDR3 SODIMM (MT8KTF51264Hz-1G9E1)
Communication Interface
PCI-E Gen3 x8 (8Gbps/lane)
Four SFP+ interface (4 RocketIO GTH transceivers) supporting 10Gbps
Expansion Connectors
QTH Connector (8 RocketIO GTH transceivers)
Two SATA-III ports
One HPC FMC Connector (10 RocketIO GTH transceivers)
One 12-pin Pmod Connector
•Programming
MicroUSB Connector for JTAG programming and debugging (shared with UART interface)
Xilinx CPLD XC2C512 for FPGA configuration
Two 512Mbits Micron StrataFlash (PC28F512G18A) for bitfile storage
Other I/Os
User LEDs and Push Buttons
Micro-SD Card Slot
Page 1 of 7sume:refmanual [
]
Walk around the Board
Include Board P/N, Version, etc.
Development Environment
Functional Description
Power
Input Supply
The NetFPGA-SUME receives power via a 2 x 4 pin PCI Express Auxiliary Power Connector. The 2 x 4 pin PCI Express Auxiliary Power receptacle (header J14) can accept both 2 x 3 and 2 x 4 pin PCI Express Auxiliary Power Plugs
found on a standard ATX power supply. When installed on a PC motherboard, you can plug the 2 x 3 or 2 x 4 pin PCI Express power supply connector directly into header J14. When used in standalone mode (without a PC motherboard),
pins 15 and 16 of the main 20 pin connector of the standard ATX power supply must be shorted together as shown in Figure 1. If these pins aren’t shorted together then the power supply will not turn on.
Figure 1. Pin 15 and 16 of Standard ATX power supply shorted together.
According to Revision 1.0 of the PCI Express 225 W/300 W High Power Card Electromechanical Specification the 2 x 3 pin plug is guaranteed to deliver up to 75 watts of power, while the 2 x 4 pin plug is guaranteed to deliver up to 150 watts of
power. While the board may be powered by either a 2 x 3 pin or a 2 x 4 pin PCI Express Auxiliary Power plug, due to the potential for high power consumption, Digilent recommends using a 2 x 4 pin plug to provide power whenever
possible.
Figure 2. Power Connector (J14).
Figure 2 describes pin-out of the power connector (header J14) when a 2 x 4 pin or a 2 x 3 pin plug is used. The Sense0 and Sense1 pins are to be connected to GND when power is present, and left floating otherwise. Since the 2 x 3 pin
plug does not include a Sense1 pin it’s possible to determine what type of plug is present, and thus how much power can be consumed.
The FPGA logic can determine whether or not a 2 x 4 pin is present by enabling an internal pull-up on pin AW42 and then checking the state of that pin. If logic ‘0’ is seen on AW42 then a 2 x 4 plug is connected and up to 150 watts of
power can be drawn. If logic ‘1’ is son on AW42 then a 2 x3 plug is connected, and the board’s power consumption should be limited to 75 watts or less.
Power Supply Topology
The high performance Virtex 7 FPGA, QDRII+ memories, and DDR3 memories featured on the NetFPGA-SUME require several different supply voltages (supply rails) in order to function. These components also require that the supply
rails are sequenced on and off in a particular order. Table 1 lists the various supply rails, their nominal voltages, and rated output currents.
Table 1
Page 2 of 7
These supply rails are derived from the 12V input (VCC12V0, comes from header J14) using eight high efficiency switching regulators and one low drop out (LDO) linear regulator from Linear Technology. Since both the DDR3 and
QDRII+ I/O supplies are powered from the VCC1V5 rail two ferrite beads are also included to prevent high speed switching noise caused by one memory from affecting the other. Figure 4, which can be found on the next page, shows
how the various supplies are derived from the input.
A Linear Technology LTC6909 is used to generate six out of phase 302 KHz clocks. Each clock is 60 degrees out of phase with any of the other clock outputs (see Fig. 3). These out of phase clocks are used as the input clocks for the
regulators that produce the high power output supply rails (VCC1V0, VCC1V5, VCC1V8, VCC3V3, and MGTAVCC). The LTC3839, which produces the VCC1V0 supply rail, is a dual phase convertor that directly utilizes the OUT1 clock
and indirectly utilizes the OUT4 clock. The use of out of phase clocks reduces the input RMS ripple current.
Figure 3. LTC6909 Clock Output Phase Relationship.
Figure 4. Regulator Topology.
Power Supply Sequencing and Supervising
Page 3 of 7
The components on the NetFPGA-SUME require that the supply voltages be sequenced on and off in a particular order. The NetFPGA-SUME utilizes two Linear Technology LTC2974’s to ensure that these sequencing requirements are
met. Each LTC2974 supports “cascade sequence ON with time-based sequence off” and can monitor the input voltage, four output voltages, four output currents, and four external temperatures using a 16-bit ADC. Additionally, the
LTC2974 can margin and trim up to four output voltages using a 10-bit DAC, allowing for more precise output voltages.
Figure 5. LTC2974 Sequencer and Supervisor.
Figure 5 depicts the connections between the two LTC2974’s, as well as the signals that are used to control the power on and off sequence. When the input voltage (VCC12V0) exceeds 10 volts the LTC2974’s will perform a power on
sequence when the power switch (SW1) is placed in the “ON” position. When a power on sequence is performed the rails come up in the following order:
1. VCC1V0
2. VCC1V8
3. VCC2V0
4. MGTAVCC
5. MGTAVTT
6. VCC3V3
7. VCC1V5, QDRVTT, and DDRVTT
8. MGTVAUX
When the input voltage falls below 9 volts, or the power switch transitions to the “OFF” position, the LTC2974’s perform a time-based off sequence and the rails come down in the following order:
1. MGTVAUX
2. VCC1V5, QDRVTT, and DDRVTT
3. VCC3V3
4. MGTAVTT
5. MGTAVCC
6. VCC2V0
7. VCC1V8
8. VCC1V0
Figure 6.Power ON/OFF Sequence.
The LTC2974’s constantly monitor the output voltage, current, and temperature associated with each channel (supply rail). This information, referred to as telemetry data, is used to determine the on status of each supply rail, as well as
monitor for fault and warning conditions. When a fault or a warning occurs the FPGA application may be notified via an interrupt that’s signaled by the LTC2974’s ALERTB, AUXFAULTB, or FAULTB1 pins. The FPGA application may
then read (using I2C) one or more of the LTC2974 status registers (defined in the datasheet) to determine the source of the fault or the warning. The output voltage, current, power, and temperature associated with any channel may also be
read using the applicable PMBUS (I2C) commands, which are defined in the LTC2974 datasheet.
In order to generate faults and warnings each channel of the LTC2974 must be configured with a nominal output voltage, under voltage warning limit, over voltage fault limit, under current warning limit, over current warning limit, over
current fault limit, under current fault limit, under temperature warning limit, under temperature fault limit, and over temperature fault limit. Tables 2 and 3 describe the voltage and current limits as pre-configured by Digilent during the
manufacturing process.
Page 4 of 7
Table 2
Table 3
Fault and Warning Interrupt Sources
When the LTC2974 detects a fault or a warning condition it may signal an interrupt by driving the ALERTB, AUXFAULTB, or FAULTB1 pins.
The ALERTB pin of the LTC2974 is an open drain output that is driven low whenever a fault or warning occurs. The ALERTB pins of the two LTC2974’s (IC43 and IC44) are connected in a wire-and fashion via the PCON_ALERT_B
net to pin J41 of the FPGA (IC12), as shown in Figure 7. Enabling the internal pull-up on pin J41 will allow the FPGA application use this pin as an interrupt when any of the followiOutng citions occur:
Output overvoltage or under voltage fault/warning
Output over current or under current fault/warning
Over temperature fault/warning
Channel output voltage has not reached or exceeded the under voltage fault limit set for that channel within TON_MAX_FAULT_LIMIT milliseconds (set to 15ms) of the output being enabled
When any of the above faults occur the PCON_ALERT_B net will be driven low until the fault condition has been removed and the CLEAR_FAULTS command has been sent to both of the LTC2974’s (IC43 and IC44).
Figure 7. ALERTB Interrupt Source.
The AUXFAULTB pins of the two LTC2974’s are connected in a wire-and fashion to the cathode of a Shottky diode via the AUXFAULT net as shown in Figure 8. The anode of the Shottky diode is connected to pin M41 on the FPGA
via the PCON_AUXFAULT_B net and protects the FPGA pin from high voltages. The LTC2974’s are configured to drive the AUXFAULTB pin low when any of the follow conditions occur:
Output overvoltage fault on any channel
Output over current or under current fault on any channel
When any of the above conditions occurs the AUXFAULT net will be driven low, which causes the PCON_AUXFAULT_B net to be pulled low through the diode. The AUXFAULT net will remain driven low until LTC2974 experiencing
the fault condition is command to re-enter the ON state. Enabling the internal pull-up on pin M41 will allow the FPGA application to detect when one of the faults described above has occurred.
Figure 8. AUXFAULTB Interrupt Source.
The FAULTB1 pin of the LTC2974 is bi-directional open-drain input/output that can be configured to drive low in response to any channel entering a “faulted off state”. The LTC2974 can also be configured to disable any given channel in
response to a logic low being detected on the FAULTB1 pin. However, it has been pre-configured by Digilent during manufacturing to serve strictly as an output that indicates when any channel has faulted off.
The FAULTB1 pin of the two LTC2974’s are connected in a wire-and fashion to the gate of a transistor (N-FET). This transistor connects to the PCON_FAULT1 net, which is in turn connected to pin N40 of the FPGA as shown in
Figure 9. When neither of the FAULTB1 pins is asserted low the gate of the transistor is pulled high and the PCON_FAULT1 net is connected to ground. When any channel enters the “faulted off state” the gate of the transistor is driven
low and the transistor turns off. Enabling an internal pull-up on pin N40 will allow the FPGA application to detect logic ‘1’ when any channel has faulted off and logic ‘0’ when no channels have faulted off.
Figure 9. FAULTB1 Interrupt Source.
Power Consumption
The power consumed by the NetFPGA-SUME is largely dependent on the number of resources utilized by a given design, and the complexity of that design. This makes estimating power consumption for any particular application difficult.
However, during the design process it was necessary to estimate the worst case scenario power consumption for each of the supply rails and to come up with a total power budget. Table 4 lists the supply voltages, their rated output current,
and the maximum power that they can output.
Table 4
Maximum Output Power: 40+22.5+27+4+49.5+8+3.6+12+0.6=167.2 Watts
Page 5 of 7
The actual power consumed from the input supply is greater than maximum output power due to the switching regulators not being 100% efficient. The switching regulators used on the NetFPGA-SUME were designed to operate at
approximately 90% efficiency. The total power consumed from the input supply can be as high as 1)/0.9+(1*12)=29.1 Watts by the FMC mezzanine module. However, ANSI/VITA 57.1 says that mezzanine modules may dissipate a
maximum of 10 Watts. This means that the maximum output power is actually 167.2-(29.1-10)=148.1 Watts.
Determining the input power consumption after taking into account the 10 Watt output power limitation of the FMC connector is difficult, as we do not know which supply rails will be utilized by an attached FMC mezzanine module.
Assuming that all power was consumed from VADJ and 3P3V the total input power consumption would be reduced by 12+((19.1-12)/0.9)=19.9 Watts. As a result, the input power consumption could be as high as 184.4-19.9=164.5 Watts
when an FMC mezzanine module is attached. If no FMC mezzanine module is attached then the maximum input power consumption is (40+22.5+(1.8*11)+(3.3*12)+8+3.6)/0.9+0.6=153.4 Watts.
FPGA Configuration
After power-on, the Virtex-7 FPGA must be configured (or programmed) before it can perform any functions. You can configure the FPGA in one of two ways:
1. A PC can use the Digilent USB-JTAG circuitry (port J16, labeled “PROG”) to program the FPGA any time the power is on.
2. One of four bitstream files stored in the parallel flash can be loaded by the.
The figure above shows the different options available for configuring the FPGA. An on-board “mode” jumper (JP1) selects between the two programming modes.
The FPGA configuration data is stored in files called bitstreams that have the .bit file extension. The ISE or Vivado software from Xilinx can create bitstreams from VHDL, Verilog®, or schematic-based source files (in the ISE toolset,
EDK is used for MicroBlaze™ embedded processor-based designs).
Bitstreams are stored in SRAM-based memory cells within the FPGA. This data defines the FPGA’s logic functions and circuit connections, and it remains valid until it is erased by removing board power, by pressing the reset button
attached to the PROG input, by writing a new configuration file using the JTAG port, or by triggering the onboard CPLD to load a new bitstream from the parallel flash.
A Virtex-7 690T bitstream is typically 229,878,496 bits and can take a long time to transfer. The time it takes to program the NetFPGA-SUME can be decreased by compressing the bitstream before programming, and then allowing the
FPGA to decompress the bitstream itself during configuration. Depending on design complexity, compression ratios of 10x can be achieved. Bitstream compression can be enabled within the Xilinx tools (ISE or Vivado) to occur during
generation. For instructions on how to do this, consult the Xilinx documentation for the toolset being used.
After being successfully programmed, the FPGA will cause the “DONE” LED to illuminate. Pressing the “PROG” button at any time will reset the configuration memory in the FPGA. After being reset, the FPGA will immediately attempt
to reprogram itself from the parallel flash, assuming JP1 is not loaded.
The following sections provide greater detail about programming the NetFPGA-SUME using the different methods available.
JTAG Configuration
The Xilinx tools typically communicate with FPGAs using the Test Access Port and Boundary-Scan Architecture, commonly referred to as JTAG. During JTAG programming, a .bit file is transferred from the PC to the FPGA using the
onboard Digilent USB-JTAG circuitry (port J16) or an external JTAG programmer, such as the Digilent JTAG-HS2, attached to port J9. You can perform JTAG programming any time after the NetFPGA-SUME has been powered on,
regardless of whether or not the mode jumper (JP1) is set. If the FPGA is already configured, then the existing configuration is overwritten with the bitstream being transmitted over JTAG. Setting the mode jumper is useful to prevent the
FPGA from being configured from the parallel flash.
Programming the NetFPGA-SUME with an uncompressed bitstream using the on-board USB-JTAG circuitry usually takes around a minute . JTAG programming can be done using the hardware server in Vivado or the
iMPACT tool included with ISE.
Configuration using Parallel Flash
In order to meet the PCIe specification, an expansion card must be able to respond to PCI enumeration commands within 200 milliseconds of the power supplies becoming stable. On the NetFPGA-SUME, responding to PCI commands
requires the FPGA to be configured, so meeting this spec requires an extremely fast configuration solution be used. This is achieved by using a CPLD that reads a stored bitstream out of flash and configures the FPGA over a 32-bit
SelectMAP interface clocked at 100MHz.
Digilent designed the firmware for the CPLD so that four different bitstreams can be stored in the flash.
Memory
DDR3 SODIMM
The NetFPGA-SUME board comes with two Micron MT8KTF51264HZ-1G9 4GB DDR3 SDRAM SODIMM which employs an 932.84MHz 64bit-wide data bus capable of operating at a data rate of 1866MT/s. Project development with
the SDRAM involves using the Xilinx Memory Interface Generator (MIG) in Vivado Design Suite. The interface is automatically configured by the MIG for use with the AXI4 system bus and provide a fixed 4:1 memory to bus clock ratio.
The input clock for both SDRAM SODIMMs is a 233MHz clock generated by Discera DSC1103 Low Jitter Precision LVDS Oscillator. The clock period of SDRAM is configured to 1177ps (849.62MHz), equivalent to 1700MT/s, due to
the read margin issues. Please refer to Xilinx Answer Record AR61853 for further information. The NetFPGA-SUME uses a VCCAUX-IO of 2.0V to support high performance DDR3 frequency settings. Please see Xilinx 7 Series FPGAs Memory
Interface Solutions User Guide (UG586) and the micron 1GB, 2GB, 4GB (x64, SR) 204-Pin DDR3L SODIMM data sheet for more details. The DDR3 project in unit test project in netfpga repository provides a good starting point for project
development.
QDR II+ SRAM
Three 9MB Cypress CY7C25652KV18 QDRII+ Quad Data Rate SRAMs are provided for applications that require high speed, low latency memory. Common applications include FIFO buffers and look-up tables. The notion of “Quad”
data rate comes from the ability to simultaneously read from a unidirectional read port and write to a unidirectional write port on both clock edges. The QDRII+ SRAMs on NetFPGA-SUME board are capable of operating at up to
500MHz to yield data transfer rates of up to 1GT/s per 36-bit wide data bus. The Xilinx Memory Interface Generator (MIG) is able to generate and configure an native interface into the QDRII+ via the user friendly wizard tool. More
information regarding the QDRII+ memory part and the Xilinx MIG tool can be found in the Cypress CY7C25632KV18/CY7C25652KV18 data sheet, the Cypress Application Note QDR-II, QDR-II+, DDR-II, DDR-II+ Design Guide
Page 6 of 7
(AN4065), and the Xilinx 7 Series FPGAs Memory Interface Solutions User Guide (UG586). As QDRA and QDRB shares FPGA bank 17, a bank sharing solution for QDRA and QDRB working simultaneously is still in development. Please refer
to Xilinx Answer Record 41706 for further information.
Storage
FLASH
Onboard parallel flash is available for storing FPGA bitstreams. For information on writing bitstreams to flash and configuring the FPGA from stored bitstreams, see the section titled “Configuration using Parallel Flash”.
Micro-SD Card
The micro-SD card connector on NetFPGA-SUME board provides a removable non-volatile storage resource. This connector supports a micro-SD memory card and meets all physical layer requirements of both SPI and SD bus protocols.
It supports the UHS-I pin assignment standard (but not UHS-II) and provides high speed signaling at 3.3V to support SC, HC, and XC class SD cards. Please see SD Specifications Part 1 Physical Layer Simplified Specification by the
Technical Committee of the SD Card Association for more details regarding the use of SD memory cards with this connector.
SATA
The NetFPGA-SUME board provides two SATA ports which are SATA-III compatible (6Gbps). Two GTX transceivers (Lane 0,1 on Bank 116) are dedicated to these two ports with a master clock of 150MHz generated by Discera
DSC1103 Low Jitter Precision LVDS Oscillator. SATA PHY controller can be generated using Xilinx GTX Transceiver Wizard. Please refer to Xilinx Answer Record AR 53364, AR 44587 and UG769 7 Series FPGAs Transceivers Wizard v2.6
User Guide for more information.
PCI Express
The NetFPGA-SUME is designed with a PCI-Express form factor to support interconnection with common processor motherboards. Eight of the FPGA’s high speed serial GTX transceivers are dedicated to implementing eight-lanes of
Gen. 3.0 (8 GB/s) PCIe communications with a host processing system (there is no support for Gen3 x4 configuration). These transceivers work in conjunction with the on-chip 7 Series Integrated PCI Express Block and synthesizable on-
chip logic to provide a scalable, high performance PCI Express I/O core. Please refer to the Xilinx 7 Series FPGAs Integrated Block for PCI Express V2.0 (PG054) product guide and 7 Series FPGAs GTX/GTH Transceivers (UG476)
user guide for more information.
SFP+ 10Gbps Ethernet Interface
The NetFPGA-SUME board provides four enhanced small form factor pluggable (SFP+) connectors, each supports 10Gbps. Four of the FPGA’s high speed serial GTX transceivers on Bank 119 are dedicated to four SFP+ ports. These
connectors are capable of implementing 10GBase-SR/LR Ethernet Protocols.
I2C
Clocking
Expansion Interfaces
FMC
The NetFPGA-SUME board includes a VITA-57 compatible FMC (FPGA Mezzanine Card) carrier connector. A High Pin Count (HPC) connector is used to provide the maximum possible compatibility with a variety of commercially
available mezzanine cards. Select I/O ports on the XC7V690T are connected to all of the standard Low Pin Count (LPC) signals on the connector, due to the limitations of the FFG1761 package. All the I/O ports connected to FMC
connector only supports 1.8V logic. All 10 differential send/receive pairs for GTX transceivers are also supported.
Please refer to the American National Standards Institute ANSI/VITA 57.1 FPGA Mezzanine Card (FMC) Standard for additional detail regarding standard FMC module and carrier requirements. Refer to Appendix B for specific I/O
constraints relating FPGA pins to their associated FMC control and connector pins.
QTH
PMOD
The NetFPGA-SUME board also provides a Pmod Connector for peripheral extension. The Pmod connectors are arranged in a 2×6 right-angle, and are 100-mil female connectors that mate with standard 2×6 pin headers. Each 12-pin
Pmod connector provides two 3.3V VCC signals (pins 6 and 12), two Ground signals (pins 5 and 11), and eight logic signals, as shown in Fig 20. The VCC and Ground pins can deliver up to 1A of current. Pmod data signals are not
matched pairs, and they are routed using best-available tracks without impedance control or delay matching.
Debug Features
SAMB
LEDs
Reset Button
Push Buttons
Cooling
Michael A
1) 40+22.5+47+4+49.5+8+3.6)/0.90)+12+0.6=184.4 Watts. However, this number can be a bit misleading as it assumes that a mezzanine module is attached to the FMC connector and is drawing the maximum allowable current from the
VADJ, 3P3V, and 12P0V rails simultaneously. According to the FMC specification, which is defined by ANSI/VITA 57.1, an FMC carrier card (NetFPGA-SUME) must be capable of supplying 4A to VADJ (1.8V, VCC1V8), 3A to 3P3V
(3.3V, VCC3V3), and 1A to 12P0V (12.0V, VCC12V0). This implies
Page 7 of 7
1
/
20
/
201
5
htt
p
s://reference.di
g
ilentinc.com/sume:refmanua
l