3.1 Overview
The MZ7035 development board has a 2-channel SFP interface that can realize Gigabit fiber Ethernet communication. The basic logical framework for using the development board to implement UDP transmission in the Gigabit Network is shown in the figure below.FPGAThe program is implemented based on Milian's new version of UDP IP protocol stack, as well as Xilinx's IP core Tri Mode Ethernet MAC, 1G/2.5G Ethernet PCS/PMA or SGMII.
This document corresponds to two routines, namely udp_ip_1g_sfp and udp_ip_1g_sfp_4ch, respectively, which realize single-channel and 4-channel gigabit UDP fiber transmission (MZ7035FA only has two SFPs, and MZ7035FB(D) has four SFPs). The routine is developed based on vivado 2017.4.
3.2 SFP interface
There are 4 SFP shielded cages on the development board. The SFP shielded cage can be inserted into Gigabits. The SFP signal definition is shown in the figure below.
3.3 IP settings
3.3.1 Tri Mode Ethernet MAC Settings
Since Gigabit communication is used, the rate is set to 1Gbps. As shown in the figure below.
First, since this IP needs to be connected through the GMII interface with the IP core 1G/2.5G Ethernet PCS/PMA or SGMII, there is no need to add I/O BUF to the GMII interface inside the IP core. Therefore, it is necessary to set PHY Interface to Internal.
Secondly, since 1G/2.5G Ethernet PCS/PMA or SGMII uses 1G optical communication using 1G, the speed is fixed at 1G. Therefore, the MAC speed of the Tri Mode Ethernet MAC needs to be set to 1000Mbps to match it.
When the Tri Mode Ethernet MAC is used with 1G/2.5G Ethernet PCS/PMA or SGMII, the gtx_clk clock source of the Tri Mode Ethernet MAC needs to be provided by 1G/2.5G Ethernet PCS/PMA or SGMII. Generally, the user_clk2 (125MHz) clock output by 1G/2.5G Ethernet PCS/PMA or SGMII is selected as the clock source for Tri Mode Ethernet MAC.
Set the configuration method of the Tri Mode Ethernet MAC to be configured through the AXI-Lite interface.
Set the clock of the AXI-Lite interface to the same frequency as the user_clk2, that is, 125MHz, so that the same clock source can be used.
MDIO is not used in the design to connect to the 1G/2.5G Ethernet PCS/PMA or SGMII IP core, so the MDIO interface is not enabled.
The above settings are shown in the figure below.
Shared logic does not need to be set.
Audio and video bridges, flow control, parameter statistics and other functions are not used. As shown in the figure below.
3.3.2 1G/2.5G Ethernet PCS/PMA or SGMII settings
Using 1000BASEX mode, you need to set the speed to 1G, as shown in the figure below.
Select 1000BASEX mode, as shown in the figure below.
Using the development board's GTX transceiver as the SFP interface, the clock source of the MMCM input inside the IP core is selected as the clock TXOUTCLK output by the GTX transceiver. This MMCM will generate the user interface clock we need. The MDIO interface is available or not, and the MDIO interface is not enabled here. Enable self-negotiation. As shown in the figure below.
When only 1 IP core is included in the design, the shared logical resources should be included andhardwareModules are included inside the IP core, which will reduce the number of generated modules and simplify the design. As shown in the figure below.
When several of this IP cores are required to be used simultaneously in the design, and the GTX used is located in the same GTX BANK. At this time, only the shared resources (MMCM, GTX PLL, GTX reference clock, etc.) within one of the IP cores can meet the needs of all IP cores, that is, choose to include the shared resources within the IP core. The remaining IP cores can just remove these shared resources from their internals, that is, choose to include the shared resources in the example.
3.4 IP core structure
3.4.1 Tri Mode Ethernet MAC
3.4.1.1 Clock Network
The internal clock network structure of the IP core is shown in the figure below. Among them, tx_mac_aclk is the synchronization clock of the AXI-Stream transmission interface, and rx_mac_aclk is the synchronization clock of the AXI-Stream reception interface. Since the MDIO interface is not used in the design, the clock signal mdc does not exist.
gtx_clk is the global clock source for the IP core to operate at a frequency of 125MHz. s_axi_aclk is the synchronous clock of the AXI-Lite interface. The other clocks refclk, gtx_clk90, etc. are related to the GMII, RGMII interfaces and external PHY chips, because the IP core is connected to 1G/2.5G Ethernet PCS/PMA or SGMII in the design. Therefore, these clocks are not required.
3.4.1.2 User Interface
Here are some important user interfaces, and other interfaces can be described in the IP core manual.
3.4.1.2.1 AXI-Stream Receiver Interface
The AXI-Stream receives the interface signal as shown in the figure below. The user receives Ethernet packets output from the IP core through this interface. It should be noted that the receiving interface does not use the tready signal in the AXI-Stream standard. This means that the receiver needs to have the ability to continuously receive data to prevent overflow from being overloaded without time to receive it.
The timing of the AXI-Stream receiving interface is shown in the figure below.
3.4.1.2.2 AXI-Stream Send Interface
The AXI-Stream sending interface signal is shown in the figure below. Through this interface, the user transmits the Ethernet packets required to be sent to the IP core. Among them, tx_ifg_delay is used to set the sending interval, and generally uses the minimum interval by default. Just set tx_ifg_delay to 0.
The timing of the AXI-Stream sending interface is shown in the figure below.
3.4.1.2.3 Receive and send data statistics
The signals in the figure below are used to count and output various types of information corresponding to the currently sent or received frames. Except for debugging, it is generally not necessary.
The signal timing is shown in the figure below.
3.4.1.2.4 Flow control signal
In the case of non-high bandwidth and large load transmission, the flow control function is generally not required. Therefore, there is no need to send a pause frame, and set the following two signals to 0.
3.4.1.2.5 AXI-Lite interface
The AXI-Lite interface is mainly used to configure and read registers inside the IP core. In addition, you can also directly configure the registers of the external PHY chip or 1G/2.5G Ethernet PCS/PMA or SGMII IP core through the MDIO interface. Since MDIO is not used. Therefore, the AXI-Lite interface is mainly used for IP core settings.
3.4.1.2.6 Reset signal
The reset signal network of the IP core is shown in the figure below.
Where glbl_rstn is a global reset signal, used to reset the entire IP core. rx_axi_rstn and tx_axi_rstn are used to reset the logic of the receiving and sending parts separately, and are generally not required. In the routine, both rx_axi_rstn and tx_axi_rstn are set to 1.
tx_reset and rx_reset are respectively used to represent the reset state of the sending and receiving part of the IP core. Based on these two signals, it can be determined whether the IP core is in the reset state. These two signals need to be used in conjunction with the synchronization clock signals tx_mac_aclk and rx_mac_aclk of the AXI-Stream reception and transmission interface. This is because the IP core may output tx_mac_aclk and rx_mac_aclk when tx_reset and rx_reset change from 1 to 0. For logic that uses tx_mac_aclk and rx_mac_aclk as synchronous clocks, the reset signal must be referenced by tx_reset and rx_reset to avoid reset invalidity. For this, a reference design is given in the routine.
3.4.2 1G/2.5G Ethernet PCS/PMA or SGMII
3.4.2.1 Clock Network
The internal clock network structure of the IP core is shown in the figure below.
3.4.2.2 Multi-IP resource sharing
In the routine udp_ip_1g_sfp_2ch, 2 SFP interfaces are used at the same time, and 2 IP cores are instantiated in the program. 2 IP cores use shared MMCM, GTX PLL and GTX reference clocks.
3.4.2.2.1 Clock Network
When multiple IPs are needed and the GTX used is located in the same GTXBANK, the clock networks between different IP cores can be shared, as shown in the figure below. Multiple IP cores can share the same reference clock as the same GTX BANK and the clock signal output by the same MMCM.
3.4.2.2.2 Shared port
The shared signal connection between multiple IP cores is shown in the figure below.
In the above figure, the IP core on the left chooses to include the shared resources inside the IP core, while the IP core on the right chooses to remove the shared resources from the IP core and move them to the example desgin. In the udp_ip_1g_sfp_2ch routine, 2 IP cores are instantiated. Among them, gig_ethernet_pcs_pma_i_1 corresponds to the IP core on the left of the picture above, and pcs_pma_i_2 corresponds to the IP core on the right of the picture above. This depends on the 1 IP setting mentioned above, where the settings of pcs_pma_i_1 are shown in the figure below.
The settings of pcs_pma_i_2 are shown in the figure below.
The definition and connection relationship of each shared signal are shown in the figure below.
3.4.3 User Interface
Here are some important user interfaces, and other interfaces can be described in the IP core manual.
3.4.3.1GMII interface
The synchronous clock of the GMII interface is the output clock of the IP core userclk2.
3.4.3.1.1 GMII sending timing
3.4.3.1.2 GMII Receive Timing
3.4.3.2 independent_clock_bufg
independent_clock_bufg is the input clock with a frequency of 200MHz. In the example design of the IP core, the GMII interface is connected to IDELAYE2 and is connected to the chip pin as IO. The 200MHz clock input from independent_clock_bufg is used as the reference clock for IDELAYCTRL. In this routine, GMII is used as an internal signal and does not serve as an IO port, so IDELAYCTRL is not required.
In addition, independent_clock_bufg is also used by other logic inside the IP core. Therefore, independent_clock_bufg must enter a 200MHz clock regardless of whether IDELAYCTRL is required. The suffix bufg means that the 200MHz clock has entered the independent_clock_bufg port and entered the global clock network through the BUFG.
3.4.3.3 signal_detect
To make the IP core work properly, signal_detect need to be set.
3.4.3.4 Configuration_Vector
Configuration_Vector is used to configure the basic working mode of the IP core and can replace the functions of the MDIO interface. The specific meaning is shown in the figure below.
In the routine, the configuration of Configuration_Vector is as follows:
3.4.3.5 an_adv_config_vector
an_adv_config_vector is used to configure the self-negotiation function of the IP core, and its specific meaning is shown in the figure below. For the 1000BASEX mode, you only need to pay attention to bit5, bit8~7, bit13~12.
In this routine, the an_adv_config_vector is set as follows. Enable full duplex, no pause is used for flow control, no error status.
assign an_adv_config_vector = 16'b0000000000100001;
3.4.3.6 Status_Vector
Status_Vector reflects the working status of the IP core, and its specific meaning is shown in the figure below. Several of these signals can be connected to the LED light for observation. The more important signals are bit0, bit1, and bit12.
3.4.4 Connection between Tri Mode Ethernet MAC and 1G/2.5G Ethernet PCS/PMA or SGMII
In the design, it is necessary to interconnect the Tri Mode Ethernet MAC with 1G/2.5G Ethernet PCS/PMA or SGMII through the GMII interface. The operating clock source of the Tri Mode Ethernet MAC IP core is userclk2 output from 1G/2.5G Ethernet PCS/PMA or SGMII, and the frequency is 125MHz. At the same time, userclk2 is also synchronized with various signals on the GMII interface. The GMII interface connection is shown in the figure below. Since the MDIO interface is not used in this routine, no connection is required.
3.4.5 Notes on IP core usage
3.4.5.1Tri Mode Ethernet MAC
3.4.5.1.1 Data sending length
Tri Mode Ethernet MAC IP core By default, when the sending frame length is less than 64 bytes, the IP core will automatically add 0 at the end of the frame, fill it up to 64 bytes, and then automatically insert 4 bytes of CRC at the end of the frame. During reception, the IP core automatically removes the last 4-byte CRC, but the 0 supplemented at the end of the frame with a length of less than 64 bytes will not be removed and will still be output through the reception interface. Therefore, the data length that can be sent is between 14~1514 (including 14-byte MAC frame header) bytes.
This is described below in the IP usage documentation.
3.4.5.2AXI-Lite interface configuration policy
Through the example that comes with vivado, you can observe the process of configuring the IP core through AXI-Lite in the example, as shown in the figure below. For specific register definitions, refer to PG051. In the routine, this part of the code is directly used.
It should be noted that the last three steps are all related to the frame filtering function.
Among them, unicastaddressIt is to set the local MAC address of the IP core to be used for address matching during filtering. If the frame filtering function is enabled, except for the broadcast address, pause address and local MAC address, frames containing other destination MAC addresses will be filtered and will not be received by the user.
The last step enables the promiscuous mode of the IP core, and turns off the receive frame filtering function, so that the IP core can receive Ethernet frames containing any destination MAC address. If the user needs to consider MAC address filtering, the function can be turned off. The related registers are shown in the figure below.
3.5 Constraints
3.5.1 GTX reference clock constraints
When using GTX, the differential reference clock of the GTX BANK input needs to be constrained, and its corresponding pin position and clock frequency need to be constrained. As shown below (the following is the sample code. If there is any difference from the supporting code, the supporting code and schematic diagram shall prevail):
set_property PACKAGE_PIN U6 [get_ports gtrefclk1_p]
set_property PACKAGE_PIN U5 [get_ports gtrefclk1_n]
create_clock -period 8.000 -name gtrefclk -add [get_ports gtrefclk1_p]
3.5.2 GTX position constraints
In the XDC file, it is necessary to lock the specific location of the GTX used by the IP core corresponding to the FPGA chip, and specify the path where the GTX primitives are located in the project.
In the MZ7035 development board, 1 GTX BANK is included. The four SFP modules are connected to the GTX of X0Y12, X0Y13, X0Y14, and X0Y15 respectively.
For example, the constraints in XDC in the udp_ip_1g_sfp routine are as follows:
set_property LOC GTXE2_CHANNEL_X0Y15 [get_cells gig_ethernet_pcs_pma_i_1/*/*/transceiver_inst/gtwizard_inst/*/gtwizard_i/gt0_GTWIZARD_i/gtxe2_i]
For example, the constraints in XDC in the udp_ip_1g_sfp_4ch routine are as follows:
set_property LOC GTXE2_CHANNEL_X0Y12 [get_cells gig_ethernet_pcs_pma_i_4/*/transceiver_inst/gtwizard_inst/*/gtwizard_i/gt0_GTWIZARD_i/gtxe2_i]
set_property LOC GTXE2_CHANNEL_X0Y13 [get_cells gig_ethernet_pcs_pma_i_3/*/transceiver_inst/gtwizard_inst/*/gtwizard_i/gt0_GTWIZARD_i/gtxe2_i]
set_property LOC GTXE2_CHANNEL_X0Y14 [get_cells gig_ethernet_pcs_pma_i_2/*/transceiver_inst/gtwizard_inst/*/gtwizard_i/gt0_GTWIZARD_i/gtxe2_i]
set_property LOC GTXE2_CHANNEL_X0Y15 [get_cells gig_ethernet_pcs_pma_i_1/*/*/transceiver_inst/gtwizard_inst/*/gtwizard_i/gt0_GTWIZARD_i/gtxe2_i] Since the GTX primitive is used to lock the physical position of the GTX in the chip, each GTX corresponds one by one to the corresponding RX and TX pins, the RX and TX pins of the GTX can be not constrained.
3.6 Routine Design
This tutorial designs two test routines: udp_ip_1g_sfp and udp_ip_1g_sfp_4ch.
The principle of routines is shown in the figure below. Both routines implement the UDP transceiver loop, that is, sending any UDP packet of less than 1472 bytes to the development board through the network debugging assistant on the computer. The development board receives the udp packet and sends it back to the computer after being cached through fifo, thereby verifying the correctness of data transmission and reception.
Since the AXI-Stream data interface bit width of the UDP IP protocol stack is 64 bit width, while the AXI-Stream data interface bit width of the Tri Mode Ethernet MAC is 8 bit width. Therefore, to interconnect the UDP IP protocol stack and the Tri Mode Ethernet MAC through the AXI-Stream interface, the clock domain and data bit width conversion is required. The implementation plan is shown in the figure below.
3.6.1 AXI-Stream DATA FIFO
Both the transceiver paths use 2 AXI-Stream DATA FIFOs, and one of them realizes the conversion of the asynchronous clock domain, and one FIFO realizes data buffering and synchronization.Packetmode function.
Because the AXI-Stream data interface synchronization clock signal of Tri Mode Ethernet MAC at 1G rate is 125MHz. At this time, the synchronization clock signal of the AXI-Stream data interface of the UDP IP protocol stack 64bit should be 125MHz/(64/8)=15.625MHz. Therefore, the clocks across the asynchronous AXI-Stream DATA FIFO are 125MHz (8bit) and 15.625MHz (64bit) respectively.
Packet mode refers to the FIFO continuously caches the data input from the AXI-Stream interface before outputting data until the tlast signal at the input end is pulled up, that is, a complete data packet is filled, and then the data output port will begin to output data outwards at the AXI-Stream output port. Packet mode function settings are shown in the figure below. It should be noted that when Packet mode is enabled, FIFO must work in synchronization mode.
For sending paths, Packet mode is enabled to prevent FIFO from being read empty by IP cores. For the receive path, it is because Milian's UDP IP protocol stack requires that the tvalid signal of 1 packet must always be 1 during the duration of the packet.
3.6.2 AXI4-Stream Data Width Converter
After the AXI-Stream interface of the UDP IP protocol stack is converted through the FIFO clock domain, the data bit width conversion is also required. The data bit width conversion is completed through the AXI4-Stream Data Width Converter.
In the receiving path, 8-bit to 64-bit conversion is performed, and the AXI4-Stream Data Width Converter settings are shown in the figure below.
In the sending path, 64-bit to 8-bit conversion is performed, and the AXI4-Stream Data Width Converter settings are shown in the figure below.
3.7 Routine Testing
udp_ip_1g_sfp implements a single-channel UDP network transmission function. udp_ip_1g_sfp_4ch simultaneously instantiates 4 UDP IP protocol stacks, realizing 4-channel UDP network transmission function. In the routine, the computer's IP address is 192.168.10.2, the UDP port number is 61441, the IP addresses of the four SFP interfaces in the development board are 192.168.10.1, and the UDP port number is 61440.
For the udp_ip_1g_sfp routine, insert the SFP optoelectronic module and network cable into the shielded cage corresponding to SFP-A. For the udp_ip_rgmii_2ch routine, the SFP electrical module and network cable can be connected to any SFP shielded cage of SFP-A, SFP-B, SFP-C, and SFP-D.
Before testing, you need to set the IP address of the computer network card used to 192.168.10.2 and the subnet mask to 255.255.255.0, as shown in the figure below.
3.7.1 UDP loop test
Open the network debugging assistant, set the IP address bits and UDP port numbers of the computer and development board respectively, and send text data to the development board in the form of an udp package through the network debugging assistant, and send it continuously at 1ms intervals, as shown in the figure below.
The test results are shown in the figure above. As can be seen from the figure, the udp packets sent by the network debugging assistant are the same as the udp packets returned by the development board, and the number of data packets received and sent is the same.
By wiresharksoftwareThe communication packets between the computer and the development board can be captured, as shown in the figure below.
3.7.2 Ping, ARP test
While the network debugging assistant sends UDP packages to the development board, it continuously issues ping commands to the development board through the cmd command window, observing the return of the ping command, as shown in the figure below. From the figure, we can see that the development board can quickly respond to the ping commands initiated by the computer, and the UDP packet sending and receiving and ping response do not interfere with each other.
The data of the ping process is verified through the wireshark software, as shown in the figure below.
During the entire test process, the computer will continuously send ARP requests to the development board at a certain frequency, and the development board can respond in a timely manner. The ARP communication process between the computer and the development board captured by the wireshark software is shown in the figure below.