草榴社区

Data centers are currently grappling with the escalating need to boost bandwidth capacities. This is primarily driven by the emergence of new technologies, particularly the growing demand for AI/ML applications. As these technologies evolve, bandwidth requirements are expected to grow exponentially. Specifically, as Large Language Models (LLM) become more precise and expansive, they demand increasingly higher processing speeds. This surge in demand for rapid processing of LLM models has highlighted inefficiencies within data centers. This technical bulletin will delve into the realm of PCIe over Optics, a promising solution for escalating bandwidth demands in data centers. We’ll explore resource limitations, latency challenges, and energy consumption.  

PCIe the interface of choice within rack servers, linking resources together through copper cables or a backplane. With over seen over six generations of deployment, and the ratification of the PCIe Gen 7.0 specification coming up, PCIe will continue to be a key player for high-speed interconnects. Figure 1 illustrates the full stack of data communication over the PCIe link, alongside its associated components. 

Figure 1: Full stack of data communication over PCIe link

Key Bottlenecks for AI Workloads in Data Centers

Resource limitation 

Current data centers are experiencing efficiency challenges due to memory bandwidth and memory utilization. The restriction of only accessing local memory not only limits the speed of data processing but also leads to underutilization of data center memories. This occurs despite the evolution of processors to include more and faster cores. 

Latency  

Latency currently poses a significant bottleneck for most AI/ML applications. The transfer of high data rates and complex modulation schemes over copper cables, backplanes, requires the use of advanced equalization techniques and algorithms such as forward error correction (FEC), but these further contribute to system latency.  

Energy consumption  

Power is the scarcest resource in data centers, and current technologies demand the utilization power-hungry chips. An estimated 25% of a data center’s total power is used solely for point-to-point data transfers. As the need for data transfer grows, particularly with the advent of AI/ML applications, this energy consumption is predicted to rise dramatically. 

Scaling challenges  

Demand for data transfer and data processing is going higher and higher with emerging requirements and technologies, this would directly lead to higher memory and faster memory access. Datacenter growth requires network architecture to scale accordingly and designing networks that could be scaled without too much financial burden becomes very important. The ability to scale resources up or down based on demand is crucial for AI workloads, which can be highly variable.

Subscribe to the 草榴社区 IP Technical Bulletin

Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.

Why PCIe Over Optics

Optical links offer higher bandwidth density compared to electrical links. Initially, PCIe interfaces were developed to be utilized over copper, DAC, and PCB interconnects. However, as data rates increase and electrical losses escalate, this approach is becoming less appealing. 

Optical links have the advantage of covering longer distances. Resource limitations, particularly memory constraints, are becoming increasingly challenging to address using the current architecture of PCIe over copper, which only permits access to local memory. Optical technology, however, can overcome this limitation by enabling different processing units to access further memory units in different server units or racks. This is beneficial for resource pooling or sharing over CXL switch and other similar applications. 

When it comes to energy efficiency and cost-effectiveness over longer distances, optical links excel. They are far less lossy compared to electrical links, which means they demand fewer re-timers and signal conditioning units over the same distances. Additionally, the use of low-cost, high-yield optical components could further reduce costs per distance. Copper interconnects, on the other hand, occupy a lot of space in data centers and are not suitable for dense data centers. In contrast, optical fibers are more flexible and take up less space, making them a better option for increasing density in data centers. 

Finally, linear or direct drive optical links can also help to reduce latency and power consumption. Different optical architectures can be employed for PCIe over optics, leading to improved latency. For instance, linear direct drive optics avoids an extra timer in the link, resulting in reduced latency. 

Figure 2 shows a PCIe over optics use case scenarios for data center intra rack and rack to rack configurations based on requirements from OCP (Open Compute Project). These applications range from compute, storage, accelerator, and memory connectivity scenarios for NVMe & CXL enabled disaggregated data centers. 

Figure 2: OCP General PCIe Connectivity Intra Rack & Rack to Rack 

Design Considerations for Enabling Optical PCIe Interfaces

The PCIe Interface was not originally conceived with optical compatibility in mind. Applications of PCIe interconnects, such as CPU to CPU, GPU to GPU, and GPU to memory, were typically addressed using the current PCIe PHY and controller, from the root complex to the endpoint, via copper-based channels. Consequently, transitioning from PCIe with electrical channels to PCIe over Optics is not a straightforward process and has its own challenges. 

The first challenge lies in meeting PCIe electrical compliance. This involves the necessity for clearly defining compliance specifications to ensure interoperability. Another aspect of this challenge is maintaining backward compatibility over optical links. The second challenge concerns the support for PCIe protocol over Optics. This may necessitate alterations to the existing protocol to accommodate optical technology. These changes might encompass aspects such as Rx detection (where impedance is currently used to determine if the remote electrical receiver is ready for traffic, a method not compatible with optics), management of electrical IDLE states, performance of SSC clocks with optics, and handling of sideband signals.   

The was established on August 2023 to tackle the challenges on the adoption of PCIe optical technologies. 草榴社区 is actively involved in discussions helping contribute to the advancement towards “optical-friendly” PCIe standards. 

Retimed and Non-Retimed Topologies for Optical Links over PCIe

The retimed topology is a key approach where a maximum of two retimers are permissible within an end-to-end link. Some important aspects to consider within this topology include the strategic placement and the precise quantity of retimers deployed. 

Conversely, the non-retimed or linear topology, introduces a more complex set of challenges. This is primarily because a linear link disrupts the continuity of the path, making it more difficult to reconcile with the existing PCIe standards and compliance stipulations. Effective regulation of channel losses is paramount in this topology. Moreover, it may necessitate substantial alterations to the protocol layer, and potentially to the PHY layer as well. A comprehensive feasibility study with all types of optical engines is also a critical aspect of this topology. 

Figure 3: Different topologies to enable PCIe over optics  

In addition to link topology, other critical elements such as form factor standardization and FEC schemes should be considered to successfully establish a PCIe link over optics. Currently, form factors such as CDFP, OSFP, QSFP, QSFPDD, among others, are being evaluated, with the advantages and disadvantages of each being carefully considered. The same is happening in the FEC discussions, where concatenated FEC architectures are being considered to relax the optical PMD requirements or extend its reaches while providing low latency for overall system.

The Proof is in the Pudding

PCIe over optics is essential for establishing interconnectivity among rack units, thereby enabling them to operate as a cluster. The role of PCIe is central as it acts as a controller — the digital logic that interfaces with a particular software. One of the major hurdles is to ensure the transition to optical PCIe does not disrupt the control process of the software stack. 

An even greater challenge is managing the physical layer and the interoperability of the electro-optical interface. 草榴社区, in collaboration with OpenLight, plays a critical role here by providing an electrical IP solution that can function alongside a photonic one. Once a universal standard is established, any vendor of photonic die will be capable of integrating PCIe. 草榴社区 and OpenLight showcased during OFC 2024 the world’s first PCIe 7.0 data rate demonstration over optics, using a linear drive approach, in addition, we also featured a PCIe 6.x over optics demo. This demonstration showcased end to end link BER performance  orders of magnitude better than the FEC threshold, showcasing feasibility of PCIe 7.0 over optics running at 128Gbps PAM4. This performance was achieved using discrete electrical and optical components to build the PCIe over optics link. As demonstrated during OFC24, the same 草榴社区 SerDes that drives electrical PCIe links with an excellent PPA and latency was not limited by this non-ideal and worse case use case scenario, showcasing flexibility and robustness of 草榴社区 SerDes. 

Summary

In an age defined by AI/ML and its consequent bandwidth-demands, it is clear that PCIe over optics represents the future of signaling. Its development and adoption depends on the enablement of a supportive ecosystem, which 草榴社区 is actively pursuing. , with ongoing interoperability demonstrations and excellent field results with PCIe 7.0 data rate and PCIe 6.x over optics, help reduce integration and risk and make first-pass silicon success possible.

草榴社区 IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading