草榴社区

Anatomy of an Integrated Ethernet PHY IP for High Performance Computing SoCs

Priyank Shukla, Staff Product Marketing Manager, 草榴社区

Introduction

Hyperscale data centers need to scale appropriately as increased demand is added to the system. For the required server-to-server communication in hyperscale data centers, Ethernet has become the primary network protocol of choice as it allows hyperscalers to disaggregate network switches and install their software operating systems independently. Ethernet enables cost-effective, dense, open switches and networking technologies which reduce cost/power per bit with transistor scaling. Ethernet is a computer networking technology that defines physical and data-link layers of the Open Systems Interconnection (OSI) model. The IEEE 802.3 standard describes these functions in an architectural way with emphasis on the logical division of the system and how they fit together. Data link layer, consisting of Media Access Control (MAC) creates Ethernet data frames and uses the underlying Ethernet physical layer to transfer the data frame through a medium.

This article describes the Ethernet PHY used in high-performance computing (HPC) system-on-chips (SoCs) and how an integrated MAC + PHY IP can accelerate path to compliance and design closure.
 

Ethernet Physical Layer or PHY

Ethernet physical layer or PHY, as an abstraction layer, transmits and receives data. The PHY encodes data frames for transmission and decodes received frames with a specific modulation speed of operation, transmission media type and supported link length.

The 草榴社区 article, Understanding the Ethernet Nomenclature – Data Rates, Interconnect Mediums and Physical Layer, describes in more detail the Ethernet PHY nomenclature, which is based on data rate, modulation, and media type. Speed-specific Media Independent Interfaces (MIIs) allow use of various PHY devices for operation over different media types such as twin axial copper (BASE-C), twisted pair (BASE-T), electrical backplanes (BASE-K), or fiber optic cables (BASE-L/R).

For example, most personal computer users are familiar with “Ethernet cables” in their laptops/PCs. Figure 1 shows a simplified block diagram describing how data is transferred to and from the processor over an Ethernet cable. In this use case, Ethernet data frames (packets of data), assembled by Ethernet MAC in the CPU, travel across the mother board (a printed circuit board) through MII/GMII, defined by the IEEE802.3 standard, before reaching an Ethernet PHY which transmits electrical signals over twisted pair cables through RJ 45 connectors.

A simplified example of Ethernet data packets traveling from the processor to the Ethernet PHY in a personal computer use case

Figure 1: A simplified example of Ethernet data packets traveling from the processor to the Ethernet PHY in a personal computer

Ethernet PHY in Hyperscale Data Centers

Figure 2 shows a data center as a network of compute and storage systems connected with optical and copper media. Optics offers a power-efficient way for long-distance Ethernet links with Single Mode Fiber (SMF), providing the longest reach as it uses single path through the fiber. Multi-Mode Fiber (MMF) offers a more cost-effective alternative to SMF and is typically for disctances of 500 meters or less. A server rack unit to Top of the Rack (TOR) switch links are generally implemented using twin-axial copper cables, generally referred to as Direct Attach Copper (DAC) cables.

Usage of different types of optics in a server-to-server communication in a data center

Figure 2: Usage of different types of optics in a server-to-server communication in a data center

Movement of data packets in a rack unit of a server farm

Figure 3: Movement of data packets in a rack unit of a server farm

Figure 3 shows data packets traveling in a data center, originating from a processor in one of the rack units of a server farm. Data from the processor goes to the network interface card (NIC) through a PCIe interface. The network interface card creates Ethernet frames by implementing MAC functions. The frames travel to the Top-of-Rack (ToR) switch through a twin-axial copper PHY or DAC cable. Depending on the DAC cable length and the switch silicon in ToR rack unit physical location, retimers may be used. These retimers implement back-to-back backplane Ethernet PHYs to extend the reach of electrical signals. The ToR switch routes the frames and the optical module converts the medium from electrical to optical by implementing both electrical and optical PHY functions.

Integrated Electrical Ethernet PHY

IEEE802.3-2018 and Ethernet Technology Consortium (ETC) have defined the 400 Gb/s and 800 Gb/s standards respectively. It is important to note that the 800 Gb/s Ethernet is based on 400 Gb/s Ethernet Access Method and Physical Layer standards from IEEE 802.3-2018 and 802.3ck.

400 Gb/s Ethernet PHY architecture

Figure 4: 400 Gb/s Ethernet PHY architecture

Figure 4 illustrates the 400 Gb/s Ethernet PHY from an architectural abstraction level, showing that an 800 Gb/s or 400 Gb/s electrical Ethernet PHY implements:

  • Physical Coding Sublayer (PCS) for all required services by the 200GMII/400GMII, such as:
    • DC balancing: PCS implements 64/66-bit line coding and scrambler operations to maintain transition density and DC-balance
    • Transferring encoded data to (from) the Physical Medium Attachment (PMA)
    • Compensating for any rate differences between the 200GMII/400GMII and PMA: Such differences are caused by alignment markers insertion or deletion or due to any rate difference which the PCS corrects by inserting or deleting idle control characters
    • Transcoding from 66-bit blocks to (from) 257-bit blocks
    • Implementing Forward Error Correction (FEC) functions: FEC techniques correct errors at the receiver through coding. These are used to improve link Bit Error Rate (BER). However, coding gain and associated BER improvements come at the cost of increased latency. Considering this trade-off, based on link BER, different FECs can be implemented. Typically for links with BER greater than 10^-5 and less than 10^- 8, Reed Solomon FEC function is used as per the standard. For links where BER is greater than 10^-8 and less than 10^-12, firecode based FEC function is used. Finally, for links with BER less than 10^-12, FEC functions may not be used
    • Adjusting equalizer taps coefficients based on the protocol-defined link training functions are set for backplane applications
  • PMA layer adapts to the PCS-formatted signal to an appropriate number of abstract or physical lanes, recovers clock from the received signal and provides various transmit and receive test pattern for local loopback operations
  • Physical Medium Dependent (PMD) layer interfaces to the transmission medium, connecting the PHY to the medium which can be many different types of optical or copper cables
  • Auto-negotiation layer enables a device to understand the remote-end device’s capabilities and status. Clause 73 of the IEEE Std 802.3 standard defines a new common auto-negotiation protocol which uses the signalling that is independent of the standard speed modes. Auto-negotiation allows devices to advertise and share information, including speed, modes, fault signalling, and other control information

An integrated Ethernet PHY IP that includes the PCS, PMA, PMD and auto negotiation functionality enables faster adoption of the latest 800Gb/s and 400Gb/s Ethernet. Figure 5 shows an example implementation of an 800G/400G PCS.

Ethernet PCS block diagram

Figure 5: Ethernet PCS block diagram

Summary

Ethernet has become the defacto standard for server-to-server communication in modern HPC data centers. Ethernet data frames travel through the server units over various channels and media types. Integrating the MAC and PHY in an Ethernet system reduces design turnaround time and offers differentiated performance. 草榴社区 provides a complete 200G/400G and 800G Ethernet controller and PHY IP solution that includes the PCS, PMD, PMA and auto negotiation functionalities, as shown in Figure 6.

Complete 200G/400G and 800G Ethernet controller and PHY IP solution from 草榴社区.

Figure 6: Complete 200G/400G and 800G Ethernet controller and PHY IP solution 

The DesignWare? 112G Ethernet PHY IP delivers exceptional signal integrity and jitter performance which exceeds the IEEE 802.3ck and OIF standards electrical specifications. The area-efficient PHY demonstrates zero BER with more than 42dB channel loss and offers power efficiency of less than 5pJ/bit. The DesignWare 200G/400G and 800G Ethernet MAC and PCS support IEEE 802.3 and consortium specifications including Reed Solomon Forward Error Correction (FEC) and timestamping with low jitter for maximum precision.