草榴社区

Multi-die designs are emerging as a key solution to enhance performance, scalability, and flexibility for modern computing in data centers. By disaggregating traditional monolithic designs into smaller, heterogeneous or homogeneous dies (also called chiplets), engineers can optimize each component for specific tasks, leading to significant gains in efficiency and capability. This modular approach is particularly beneficial for data centers, which require high-performance, reliable, and scalable systems to handle vast amounts of data and complex AI workloads. 

Hyperscale data center complex and ever-evolving architectures can use various types of multi-die designs:

  • Compute dies handle core processing tasks, including general-purpose CPUs, GPUs for parallel processing, and specialized accelerators for AI and ML
  • Memory dies provide the necessary storage and bandwidth for data-intensive applications, supporting various types of memory such as DDR, HBM, and emerging non-volatile technologies
  • IO dies manage input and output operations, facilitating data transfer between compute designs and external interfaces like memory, networking, and storage, ensuring high data throughput and low latency 
  • Additionally, custom dies can meet specific requirements or optimize particular functions, including security designs for enhanced data protection, power management designs for efficient energy use, and networking designs for advanced communication capabilities. 

This article delves into how multi-die designs with PCIe & Ethernet together with UCIe IP maximize bandwidth and performance for scaling up and out modern AI data center infrastructures.

Why Scaling Up and Scaling Out is Key for Data Center Connectivity

One of the most significant challenges in constructing an AI infrastructure lies in interconnecting tens of thousands of servers spread across multiple data centers to form a vast network capable of handling AI workloads. AI data center’s complexity features multiple CPUs and accelerators, various switches, numerous NICs, and a host of other devices. Connecting these components seamlessly requires an efficient network. This is where scaling up and scaling out technologies become key. IO disaggregation provides an opportunity to address the scale up and scale out strategies. In a scaling-up scenario, PCIe & UCIe, leveraging UCIe IP for die-to-die connectivity, can act as the internal network fabric. Meanwhile, in a scaling-out scenario, Ethernet & UCIe IP can be used to enable high-speed, low-latency links between servers. 

Subscribe to the 草榴社区 IP Technical Bulletin

Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.

Scaling Up and Out in a Nutshell

Scaling up, or vertical scaling, involves boosting the resources of a single server by adding more CPUs, expanding memory, or enhancing storage capacity. This approach simplifies architecture and reduces latency, as all resources are contained within one machine. Central to scaling up is PCIe technology, which acts as the internal network fabric. The latest iteration of this technology, PCIe 7.0, would be used to connect CPUs, GPUs, NICs, storage drives, and other peripherals with low latency and high bandwidth, ensuring efficient communication within the server. 

On the other hand, scaling out, or horizontal scaling, distributes the workload across multiple servers, creating a network of machines that work in tandem. This approach is cost-effective, provides redundancy, and offers flexibility for handling growing workloads. However, it introduces complexity in network configuration and management, as communication between multiple machines can add latency. Here, Ethernet technology and the upcoming Ultra Ethernet standard, become vital, providing the high-speed, low-latency communication needed to link servers across a data center. Emerging standards are being discussed to support high-speed links between AI accelerators and switches, ensuring efficient data transfer and coordination. 

Figure 1: Depiction of the Key Interconnect Technologies Needed for Scaling Data Center Architectures

Multi-Die Designs with Ethernet and PCIe

As shown in Figure 1, there are many opportunities for multi-die designs to enable scaling up and out. Multi-die designs with PCIe, Ethernet and UCIe IP are essential to address the time to market, costs, and risk reduction challenges, while offering full architectural flexibility. Let’s dive into the main types of IO chiplets for multi-die designs, including very large AI training chips, switch SoCs and retimers. 

1. Very Large AI Training Chips 

AI chips must become significantly more efficient at both computation and data management to handle the massive data models of today. Specialized AI training chips are designed to meet these immense computational and data processing demands, integrating multiple processing units, memory, and interconnects on a single silicon die to deliver unparalleled performance and efficiency. This is where multi-die designs, integrating 40G UCIe and 224G Ethernet, step in to enable AI training efficiently. Instead of relying on thousands of huge GPUs, data centers could run their AI training with significantly less beachfront in SoCs while achieving unprecedented bandwidth, extended reach with least latency and power overhead.

224G Ethernet PHY IP provides a robust and customizable interface. With CEI-224G in development, achieving 224Gbps per lane while maintaining ecosystem interoperability and optimizing power is critical for AI training operations. Additionally, UCIe IP can deliver up to 40Gbps of high-speed, low-latency, energy-efficient data transfers across multiple dies, significantly enhancing the scalability and modularity of these chips.

Figure 2: 224G/UCIe Muti-Die design for AI Training Chips

2. 100T Switch SoCs with Electrical or Optical Co-packaged Interfaces

AI accelerators are of course a big part of the equation, but how do you connect them together? It takes a lot of switches. Switch SoCs are emerging as another solution for scaling out AI and HPC data centers while maintaining power efficiency and can provide both electrical reach of 3-4 meters or optical reach of 10-100 meters. These SoCs integrate both electrical and optical interconnects directly into CPUs and GPUs, enabling scalable and efficient network optimizations essential for resolving connectivity bottlenecks as cluster sizes rapidly grow. Electrical I/O supports high bandwidth density and low power but is limited to short reaches, optical interconnects can extend data reach significantly. Pluggable optical transceiver modules can increase reach but at unsustainable cost and power levels for large-scale AI workloads. In contrast, co-packaged optical I/O solutions can support higher bandwidths with improved power efficiency, low latency, and extended reach—precisely what AI/ML infrastructure scaling demands.

Optical and electrical IOs can support multiple high-speed channels running at 224Gbps while consuming significantly less power compared to traditional pluggable QSFPDD or OSFP transceiver modules. Furthermore, integrating advanced standards like UCIe and high-speed Ethernet addresses the limitations of traditional interconnects by facilitating high-speed, low-latency communication with the main die. 

Figure 3: 100T Optical/Electrical Switch SoCs

3. High BW IO for Retimers or Extended Reach 

Retimers and extended reach solutions are also becoming indispensable due to their critical role in maintaining signal integrity and reducing latency over long distances. Retimers support advanced protocols like PCIe and CXL, ensuring seamless integration into modern data center architectures and enabling substantial memory expansion without requiring an overhaul of existing systems. This compatibility is essential for handling memory-intensive AI inference operations and overcoming signal integrity challenges posed by newer standards like PCIe 7.0.

The convergence of PCIe and CXL protocols is reshaping data center architectures by enabling memory pooling and dynamic, cost-effective memory allocation. For retimers to be effective in this new landscape, they must be protocol-aware, capable of adapting to the rapidly evolving CXL standards. Features such as on-chip diagnostics, secure boot capabilities, and low power consumption are critical to ensuring security, ease of debugging, and sustainability. The industry's shift towards multi-die designs further underscores the necessity for versatile, high-bandwidth I/O solutions, which simplify system design and accelerate time-to-market. These technological advancements are not only crucial for meeting the current demands of AI and high-performance computing but also for future-proofing data centers against the ever-increasing computational and bandwidth requirements.

Figure 4: Retimers or Extended Reach IO Design

Example of a Multi-Die Implementation with Ethernet, PCIe and UCIe IP

Figure 5 shows an example of a multi-die design with 224G Ethernet PHY & integrated 1.6T PCS and MAC Ethernet controllers, PCIe 6.x or 7.0 PHY & controllers, security IP, sensors, DFT and UCIe PHY and controller IP. This design can be reconfigurable to enable 1.6T/3.2T/6.4T of throughput for a variety of channels, including, 45dB LR, MR, and VSR Ethernet as well as PCIe 6.x & 7.0 reaches.

  • 45dB Long Reach Ethernet & UCIe retimer Die-to-die design
  • Combo PCIe/CXL/Ethernet and UCIe die-to-die design
  • 1.6T/3.2T/6.4T scalable IO design for switches   

Figure 5 : Multi-die design block diagram 

This multi-die design supports configurable number of lanes for 224G data transmission in both directions, accommodating up to 45dB insertion loss. It aims to meet the increasing demands of AI infrastructure for higher bandwidth, reduced power consumption, and extended reach. This example implementation enhances scalability for CPU/GPU cluster connectivity and innovative compute architectures, such as coherent memory expansion and resource disaggregation.

Summary

To scale bandwidth for multi-die designs with the integration of high-speed interfaces like PCIe and Ethernet, along with UCIe IP and link health monitoring features. 草榴社区 provides high-quality and complete IP solutions for UCIe up to 40Gbps with signal integrity monitors and testability features, 224G Ethernet, and PCIe 7.0, enabling maximum bandwidth, low latency and scalability. 草榴社区 IP 草榴社区 for multi-die designs are compliant with the evolving standards, achieving interoperability with ecosystem products and silicon successes across multiple technologies, making it a low-risk solution to enabling the next-generation of data center AI chips. 

草榴社区’ comprehensive and scalable multi-die solution, encompassing EDA and IP products, enables early architecture exploration, fast software development and validation, efficient die/package co-design, robust die-to-die connectivity, and improved manufacturing and reliability.

草榴社区 IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading