草榴社区

 AI models are doubling in complexity every 4 to 6 months, outpacing Moore’s Law by a factor of four, driving data center infrastructure to also rapidly evolve. Current hyperscale data center infrastructures are struggling to meet the speed and low latency needed to process and store trillion parameter models. New infrastructures require more storage capacity, enhanced computing resources, and faster interconnects. This is where PCIe 7.0, the latest iteration of the PCI Express standard at 0.5 of the spec, comes into play. PCIe 7.0 offers up to 512 GB/s of bandwidth and ultra-low latency, enabling interconnects to handle the massive parallel computing demands of AI workloads to help mitigate data bottlenecks.

Figure 1: AI clusters expanding over years to enhance the C2C connectivity   to enable the computing, storage and bandwidth needed to process trillions of LLM parameters Taken from:

Subscribe to the 草榴社区 IP Technical Bulletin

Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.

Why PCIe 7.0 for Modern AI Data Center Infrastructures

Modern AI workloads require a specialized architecture that integrates multiple accelerators working in conjunction with a central processor. Some of the most advanced architectures require up to 1,024 accelerators within a single computing unit.  Because of this, the compute scale-up fabric needs the fastest interconnects to connect to hundreds of accelerators  with high-throughput I/O networks in order to effectively train these AI models.          

PCI-SIG announced PCIe 7.0 technology in 2022, with plans to release the full specification by 2025 (version 0.5 is currently available). This development aims to meet the substantial bandwidth demands of data-intensive applications and markets, including AI/ML, networking at 1.6T/800G Ethernet, HPC, and quantum Computing in HPC data centers. PCIe 7.0 will provide a low-latency, low-power, and reliable link between accelerators, processors, NICs, and other components, ensuring efficient connectivity for high-performance computing environments.

pcie-7.0-announcement-infographic-half-1.jpg

Figure 2: PCIe 7.0 will enable all key interconnects in the AI/ML Scale Up fabric with the bandwidth and secure data transfers needed to meet AI’s demands

How PCIe 7.0 Enables Next-Gen AI and HPC SoCs

PCIe 7.0 represents a significant advancement in hardware infrastructure for AI and HPC, offering several key benefits that cater to the demands of relentless innovation and unprecedented data sets:

  1. Increased Bandwidth: PCIe 7.0 doubles the bandwidth of PCIe 6.0, reaching speeds of up to 512 GB/s bi-directionally with 16 lanes of 128 GT/s.  This enhanced bandwidth is crucial for handling large volumes of data quickly and efficiently, which is critical for AI and HPC applications.
  2. Low Latency: With improved signaling rates, PCIe 7.0 reduces latency vital for real-time processing and responsiveness in AI algorithms and high-speed data processing in HPC.
  3. Compatibility and Scalability: PCIe 7.0 maintains backward compatibility with previous PCIe generations, ensuring interoperability with existing hardware while offering scalability for future upgrades. This is crucial for seamlessly integrating new technologies into existing AI and HPC infrastructures.
  4. Energy Efficiency: Despite the increased performance, PCIe 7.0 aims to maintain or improve energy efficiency, critical for reducing overall operational costs and the environmental impact in data centers and large-scale computing facilities.
  5. Advanced Features: PCIe 7.0 introduces new features and optimizations that further enhance its utility in demanding applications, including improved lane margining capabilities, enhanced error detection and reporting mechanisms, and support for emerging technologies such as CXL.
  6. Channel Reach and Signal Integrity Considerations: The target channel reach for PCIe 7.0 remains the same as PCIe 6.0, with 4”-14” system routing and 2”-4” AIC routing in single connection topology, and pad-to-pad channel loss of up to -36dB. To minimize insertion loss and reflection in the Root Complex reference package, improvements in connector insertion loss, return loss, PCB loss, via insertion & return loss, are performed by minimizing crosstalk. 
    • The Reference Transmitter is specified as a 4-tap Tx Equalization scheme, and further studies are required on link margin sensitivity to tap coefficient resolution and Tx presets. Transmitter and Ref clock jitter specs are almost half of PCIe 6.0, needing more precise and iterative approaches for chip-level, board and package co-design.
    • The Reference Receiver consists of a proposed reference CTLE  & ADC-based Rx architecture. Specifications for PAM-4 128Gbps stressed eye methodology, jitter tolerance, calibration channel, and Rx calibration eye mask are all yet to be defined. The reference package models for Root Complex (RC) and End Point (EP) will also be defined. 

Enhancements in PCIe 7.0 Connectors and the Transition to Optical 草榴社区

, developed by PCI-SIG and first introduced in 2000, are crucial for connecting motherboards with add-in cards (AICs) and riser cards. They support various modules like SSDs for storage, GPUs for graphics, NICs for network connectivity, and ML/DL or hybrid computing modules. For PCIe 7.0 CEM connectors, the focus is on mitigating reflection and crosstalk, ensuring low cable loss, clean conductor terminations, and minimizing skew and periodic resonance. PCIe 7.0 connectors and cables have strict signal integrity requirements, with new metrics like Return Loss excursion being discussed to improve signal quality and reliability at higher speeds. 

Additionally, the formation of the PCIe Optical Workgroup by PCI-SIG indicates a move beyond the limitations of copper signaling, particularly with CopprLink External Cable, to embrace optical solutions. Optical cabling was recently introduced to PCI-SIG, generating excitement about extending the physical reach of compute networks. This technology offers advantages like lower latency and enhanced thermal management capabilities. 

The dual focus on optical PCIe links includes adapting logical communication schemes at the protocol Layer while introducing new form factors with better thermal management and optimized optical links at the physical Layer. These advancements aim to meet the growing demands for speed, reliability, and efficiency in high-performance computing and networking. The transition to the PCIe standard at 128Gbps marks a significant evolution in chip design, promising expanded capabilities, cache coherence, and new design challenges, including:

  1. Expanded Capabilities: Optical links enable expanded ranges and higher data rates, surpassing the constraints of copper. This facilitates enhanced performance with reduced power consumption and latency.
  2. Cache Coherence: The integration of CuLink and optical links at 128Gbps SerDes & Controller supports cache coherence. This allows efficient resource sharing between processors and accelerators, optimizing overall system performance.
  3. Design Challenges: Ensuring signal integrity and power integrity at 128Gbps is critical. Design margins are crucial to compensate for intra-pair skew and lane-to-lane skew in these high-speed links.
  4. Behavioral Receiver Model: The Rx model at 128 Gbps incorporates advanced features such as a more capable feed-forward equalizer (FFE) and higher tap count digital feedback equalizer. Real-world designs are expected to exceed minimum requirements to achieve target Bit Error Rate (BER) across all practical PVT (Process, Voltage, Temperature) conditions.
  5. Stress Testing and Validation: Techniques for generating stressed stimulus signals are essential for validating these advanced receivers. This includes upgrades to support PAM4 modulation and new channel and test requirements, building upon earlier standards.

World’s First Complete IP Solution for PCIe 7.0

While standard is still in flux, 草榴社区 recently announced the world’s first complete IP solution for PCIe 7.0, including Controller, IDE Security Module, PHY, Controller and Verification IP. This solution paves the way forward to enable ecosystem connectivity in embracing this lightning speeds.

PCIe 7.0 IP TX/RX performance showcases rev 0.5 compliance

At DesignCon 2024, 草榴社区 showcased with excellent RLM. The TX to RX loopback ran at 128 Gbps over a long-reach channel, demonstrated the robustness of the IP with a pre-FEC BER multiple order of magnitude better than the spec.

To continue highlighting this technology, we also showcased at PCISIG DevCon 2024 PCIe 7.0 , including TX and RX performance in a loopback configuration, the industry’s first PCIe 7.0 interops with electrical cable channels like DAC, backplane channels as well as . Additionally, we showcased the with a successful root complex to endpoint connection showing FLIT transfers using EQ bypass mode.

Summary

PCIe 7.0 enables designers to address the escalating demands of AI and HPC environments, providing higher bandwidth, lower latency, improved energy efficiency, and compatibility with existing infrastructure. System designers need to achieve much needed and desired improvements in data throughput aiding advances in the deployment of artificial intelligence inference engines and co-processors topologies in the data center. This requires new techniques in simulation as well as post silicon validation. Innovative simulation, design, test and measurement methodologies are required for these PAM-4 inflection point. The correlation between simulation and validation, design practices for PCIe over optical cables and through electrical cables, signal integrity complication leads to noise reduction, techniques to maintain signal integrity and minimize issues like reflection and crosstalk.

The move towards PCIe at 128Gbps represents a paradigm shift in high-speed interconnect technology. It introduces new challenges and opportunities in IP design aimed at enhancing performance, efficiency, and reliability in modern computing and networking environments. 草榴社区 is at the forefront of this technology revolution with industry's first complete pre-verified PCIe 7.0 IP solution. The standards-based solution, consisting of PHY, controller, IDE security module, and verification IP, provides secure data transfers up to 512 GB/s bidirectional in a x16 configuration to mitigate data bottlenecks. With over two decades of PCI Express experience, 草榴社区 offers designers an early start for next generation HPC and AI SoCs to accelerate the path to production.

草榴社区 IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading