草榴社区

Emerging Applications for CXL

Gary Ruggles, Sr. Product Marketing Manager, 草榴社区

Introduction

In the previous article, Introduction to the Compute Express Link (CXL) Standard, we described the fundamentals of CXL and the initial target applications. This article highlights some of the emerging applications that the industry is discovering for CXL, and how they are being enabled by the rapidly evolving CXL specification.

Consider that at the time of this article, membership in the CXL consortium has grown to over 120 members and is now clearly the largest of the new high-speed interconnect/coherency standard consortiums, eclipsing membership in Cache Coherent Interconnect for Accelerators (CCIX) consortium with about 50 members, Gen-Z with approximately 70 members, and OpenCAPI with around 38 members. This is really astonishing, considering that the CXL Consortium was formed about 18 months ago compared to nearly four years ago for the other two. So, CXL is gaining traction in the industry and targeting many exciting new emerging applications.

Figure 1 shows a snapshot of the CXL specification development from the initial announcement of the CXL consortium and 1.0 specification last March to the present day. Looking at the Figure you can get a sense of how quickly things are changing.  

Figure 1: CXL’s major events and specification development timeline

Some of the major developments include an update to CXL 1.1, followed rapidly by 27 Errata to correct and/or clarify aspects of the 1.1 specification. One of the biggest developments shown in Figure 1 is the announcement of new members appointed to the CXL Board of Directors which includes IBM, AMD and Arm. Together with Intel, one of the original founders of CXL, this puts the four major CPU makers squarely behind CXL, and ensures that companies who invest in developing CXL-enabled products will have a rich ecosystem of Host platforms to choose from that extends beyond just x86 systems. To make the attraction for CXL even more compelling, Intel revealed at the recent Architecture Day 2020, that their upcoming CPU platform, Sapphire Rapids, is coming out in 2021 and includes support for CXL 1.1.

The timeline also highlights the announcement of an Memorandum of Understanding (MOU) between the CXL and Gen-Z Consortiums. In the previous article we looked at how several protocols could be used together within large systems with memory coherency to handle CPU-to-CPU, CPU-to-attached-Device, and longer distance chassis-to-chassis requirements (potentially without coherency). With the Gen-Z MOU in place, this is now clearer. We see CXL becoming the dominant solution inside servers, with Gen-Z offering connectivity from box-to-box or even rack-to-rack, leveraging its ability to use Ethernet physical layers to get longer reach connectivity than CXL can achieve using PCIe 5.0 PHYs.

Next Generation of CXL

By the time this article is published, it is likely that the next generation of the CXL specification will be finalized and released, potentially enabling a number of new applications. This new specification should be available for free download from the CXL Consortium website and has been available to Adopter members while in development. The next generation of CXL is all about enabling storage applications, and includes several new features that make CXL even more powerful for these applications. While most of these features are not yet public, and can’t be disclosed here, at least one has been discussed in various forums and can be gleaned from a recent patent application filed by Intel (United States Patent Application 20200065290), namely, the introduction of switching support that takes the next generation of CXL beyond a simple point-to-point interface. 

New Memory Applications with CXL

The introduction of switching to CXL will mean that multiple storage devices can be connected to a single Host and allow the Host to access the memory coherently as part of its memory space. This was not possible in CXL 1.1, since it allowed only a single Host-to-Device connection (no fanout). With the next generation of CXL, memory that is attached to multiple downstream devices may be able to be shared across multiple Hosts, and the memory can be split among those Hosts as needed for a particular application.

Emerging memories, often transactional, may have non-deterministic and asymmetric Read/Write timing. This makes them poorly suited for sharing the DDR bus with DRAM. Essentially, only homogeneous DIMMs can share the DDR bus, typically requiring:

  • Same generation of DDR
  • Same speed grades and timing to maintain bus efficiency
  • Same device geometry to enable interleaving across all channels
  • Same power and thermal envelopes

This is a significant limitation with the current homogeneous DIMM approach to memory expansion. CXL provides an exciting alternative for memory expansion that doesn’t rely on the DDR bus or adding more DIMMs, allowing a true heterogeneous memory attach solution. As shown in Figure 2, future memory devices can be connected to system-on-chips (SoCs) using CXL, which creates a standard, media-independent connection to virtually any memory types, including DDR, LPDDR, persistent memory, etc. Since each CXL link can connect to an optimized controller for a particular memory, performance characteristics like persistence, endurance, bandwidth, and latency, can be optimized and matched to each particular memory type.

As SoCs with multiple CPUs need to access more memory and different types of memory, this CXL-based memory approach can be ideal, as additional memory using traditional DDR interfaces can be augmented with CXL links. In addition to allowing the inclusion of varying memory types, the CXL memory expansion also uses a reduced number of pins compared to adding DDR interfaces. A CXL x4 interface with 16GB/s bandwidth in each direction would only require 16 pins. If the CXL-based memory module form factor supports a x8 CXL link, each CXL link on the SoC would still only require 32 pins, and the bandwidth would go up to 32GB/s in each direction.

While the CXL links on the SoC in the right side of Figure 2 could each be connected to a separate CPU, if the SoC design includes an embedded CXL switch (taking advantage of the next generation of CXL switching capability), then the various Host CPUs could access multiple memory devices on different CXL links, opening even more possibilities for the CXL memory expansion approach.

Figure 2: From today’s SoC with 2 channels of DDR memory to Future SoCs with additional CXL memory

CXS: A New Interface for CXL

While there is no MOU with CCIX, there are rumblings in the industry of cooperative efforts that are being considered, and that includes one I will discuss in the next section, referred to as CCIX over CXL. It is important to understand the new application-side interface that has been recently enabled by Arm and 草榴社区1.

The standard application-side interface for CXL comprises of three separate interfaces—one for each of the CXL protocols, CXL.io, CXL.cache and CXL.mem, as shown in Figure 3a. With , it now becomes possible to consider a CXL controller that includes a CXS interface.

Arm defined a new streaming interface, called CXS, to bridge from a CCIX controller to their Coherent Mesh Network (CMN-600), as shown in Figure 3b. CCIX controllers, such as the DesignWare? CCIX IP, implementing Arm’s CXS interface allow direct plug into the Arm CMN-600. Recently, Arm collaborated with 草榴社区 to update the CXS interface definition and enable Arm’s next-generation Coherent Mesh Network to connect directly into a new version of a CXL controller with a CXS interface. To make this work some of the CXL link layer functionality is moved out of the CXL controller and is taken over by the logic in the CMN, or other applications. In its place, a CXS interface block is added in lieu of the CXL.cache and CXL.mem interfaces, and the result is a design, shown in Figure 3c, that looks very similar to the CCIX controller in 3b. 

 

Figure 3: A comparison of interface options for CXL and CCIX controllers using the CXS interface

Figure 3c shows a CXL controller that now receives flits via the CXS interface that are combined with the CXL.IO flits via the ARB/Mux. Basic operation is unchanged, and the CXL controller doesn’t really need to comprehend what is in the flits that come down the left side of the controller via the CXS interface. This enables not only the direct connection with the next generation of scalable Neoverse Coherent Mesh Network, but also another interesting possibility discussed in the next section: CCIX over CXL.

CCIX over CXL

When accessing the CXL controller via a CXS interface rather than the standard separate interfaces for CXL.cache and CXL.mem, it essentially means that with proper protocol management on both sides of the link, any protocol can be theoretically passed through the CXS streaming interface in the form of 256b (64B) flits. For the CCIX coherency protocol to be run over this interface there needs to be a defined way to map it into the 64B flits used for CXL. There are many ways this could be done, and there is at least one specification draft from Arm that defines exactly how to do this2. Potentially, the CCIX Consortium could leverage this specification, or develop a specification for the same purpose, but as long as the applications on both ends of the link follow the same specification, the system will work.

The potential benefit of this approach is the ability to create symmetric coherent links between multiple SoCs using the CCIX protocol (implementing the CCIX coherency agents as needed in each SoC) while benefiting from the extremely low-latency provided by the CXL links between the SoCs. This overcomes a significant limitation of CXL (no symmetric operation) while maintaining its most significant benefit: low latency. 

Summary

The CXL specification is still in its infancy. It is hard to predict where it may go next, and what additional applications and use models may become available. It is clear, however, that the bandwidth will again double, as the next iteration of the specification is expected to utilize the PCIe 6.0 PHY at 64 GT/s. The rapidly-growing ecosystem has been contributing to the success of CXL and giving designers access to new features and new interfaces to enable new CXL applications. CXL for memory expansion, further enhanced by CXL switching support, new CXL controller designs enabled by Arm’s CXS interface, and the new possibility of transporting CCIX over CXL are among the many benefits that are emerging in the industry.

草榴社区, a leader in PCIe IP, has leveraged its expertise to deliver industry’s first CXL controller IP in addition to its silicon-proven 32GT/s PCIe 5.0 PHY for advanced FinFET processes to offer a complete DesignWare CXL IP solution supporting both the current CXL 1.1 specification, as well as the next generation of CXL.

草榴社区 support for the next generation of CXL includes the optional security/encryption features, comprising both the security-enabled CXL controllers plus CXL IDE Modules that handle the actual AEC-GCM encryption/decryption functions. These are designed to plug together seamlessly providing a complete next generation CXL security solution. 

References

  1. Arm CCIX over 64B Flits, Document # ARM AES 0026, Beta Issue A, May 1, 2020