Cloud native EDA tools & pre-optimized hardware platforms
In early 2019 a new interconnect called Compute Express Link (abbreviated CXL) was announced to the world. Building on top of the PCI Express (PCIe) physical layer and running at the then-fastest PCIe link speed of 32GT/s, the new specification was touted as being the ideal interconnect for computational coprocessors. Four years later, talk of such acceleration is nearly drowned out by seemingly endless waves of announcements about shared, pooled, disaggregated, and every other flavor of memory one can imagine being NOT directly attached to the CPU. This article will give a brief background and explanation of CXL and help guide decisions on whether it’s the right technology for your next SoC design.
When the CXL 1.0 spec was released, it brought the concept of FLITs (FLow-control unITs) into the PCIe world. Using the Alternate Protocol Negotiation feature introduced with the PCIe 5.0 spec, two PCIe link partners proceed through an otherwise normal PCIe link negotiation sequence, but instead of ending up in the PCIe “L0” state, they enter CXL mode and begin exchanging CXL FLITs instead. With its focus on a host-centric asymmetric cache coherency protocol, latency is absolutely critical to CXL’s effectiveness, and the entire CXL specification is built around minimizing latency. A discussion of the details and tradeoffs made to achieve that goal is outside the scope of this article, but for the 64-byte transfers typical of many CPUs’ cachelines, CXL offers several times lower latency than PCIe equivalents. CXL actually defines three different protocols: CXL.io, CXL.cache, and CXL.mem. CXL.io transports PCIe for backwards compatibility, bulk data transfers, and system configuration purposes – with no gain in performance over a native PCIe link. CXL.cache and CXL.mem are the real substance of the CXL spec – supporting only 64-byte transfers but offering the amazingly low latency that’s become CXL’s hallmark.
Figure 1: CXL Protocol Usage in a CXL System
Includes in-depth technical articles, white papers, videos, upcoming webinars, product announcements and more.
Originally it seemed like the CXL.mem protocol was defined simply to allow a host CPU to access an accelerator’s local memory, but the spec was wisely written generically enough to enable that host to access arbitrary memory in the CXL fabric. Initially, there was interest in using CXL.mem as a way to add non-volatile memory to systems. Traditional storage interfaces are block-oriented and difficult to adapt to the random access patterns typical of CPUs. While it’s always been possible to map memory on interfaces such as PCI Express, the I/O focus of those interfaces combined with legacy software means that memory attached there cannot effectively be cached by the host CPU. CXL.mem’s ability to access memory from the CPU’s existing cache architecture enables designers to easily attach NAND flash and other non-volatile memories in a way that appears to the host CPU as if it is simply more system memory. CXL is really the first interface to offer widespread access to non-volatile system memory, and that opens up a new frontier for software architecture which blurs the lines between files and memory.
It didn’t take long for system architects to expand this same vision of CXL-attached memory to traditional volatile memory technologies. By moving system memory away from the CPU, it becomes much more flexible. Where traditional architectures can leave unused memory connected to one specific CPU inaccessible to other CPUs, memory residing on a CXL interface can be allocated to CPU A for some period of time, and CPU B at some other time. As CXL fabrics extend to box-to-box environments, one can imagine a near future in which memory from multiple physical servers can be pooled to service a memory-intensive job. Many segments of the datacenter market found this flexibility in memory placement to be extremely attractive, and this market has been exploding over the last couple of years. Such CXL.mem devices are dominating discussion at such industry events as Flash Memory Summit, Open Compute Project Summits, and more.
Figure 2: CXL.mem Enables Memory Everywhere
The availability of CXL-attached memory products also gives designers the flexibility to use CXL interfaces for off-chip memory. In the past, an SoC designer had little choice other than to provision a DDR interface in order to access large amounts of off-chip memory. This is quite limiting when the same SoC might be used in configurations with no off-chip memory. By instead using a CXL interface, the SoC architect can repurpose that interface when it’s not needed for memory attachment.
SoC designers looking to incorporate CXL.mem into their architectures have many options available. Naturally the first attribute to consider, regardless of planned usage, is latency. It may not be immediately obvious, but standardized interfaces typically used for on-chip connectivity are usually NOT well suited to CXL.mem. Bridging between CXL.mem and an existing interface is going to add latency – potentially even requiring store-and-forward buffering to account for protocol differences. Because of that, architects and designers are well advised to connect their CXL interface as logically close to their actual buffers and data movers as possible.
Also less obvious is the PHY’s contribution to overall CXL latency. CXL’s low overall latency means that small variations in PHY latency which were little or no concern in PCIe designs can now have a measurable effect. Likewise, most commercially available CXL solutions are going to utilize the PIPE SerDes Architecture (as opposed to the Original PIPE Architecture) because it allows the CXL controller to optimize the datapath from the PHY.
As mentioned earlier, there are two ways in which an SoC can connect to a CXL link. If the SoC is providing memory to the system, it will connect as a CXL.mem Device. If the SoC is connecting to a CXL memory device, the SoC will connect as a CXL.mem Host. For maximum flexibility, designers will likely want their CXL interfaces to be able to operate in either mode, Host or Device, and so the SoC architecture needs to consider data flow and control for both. The CXL.mem protocol is straightforward, so CXL Host functionality can be largely software-driven in most implementations. CXL.mem Devices can range in complexity from simple to extremely complex. Designers do need to be aware that CXL differs from many other interface specifications by including detailed device-level controls as part of the interface specification. Various quality of service monitors and controls are specified, as are address decoding and partitioning, so CXL.mem Device designers need to pay close attention when developing the internal control and status interfaces for their designs.
CXL.mem offers both system and SoC designers significant flexibility in implementing low latency memory solutions. SoC designers need to focus on reducing their overall latency, and plan early for specification-required application logic when implementing CXL.mem Devices. 草榴社区 offers a wide range of CXL and PCIe controller and PHY IP including Dual-Mode (both CXL Host and CXL Device run-time selectable in a single controller) and Switch port support. 草榴社区 further has the expertise to help guide SoC architecture and design choices to achieve the best possible latency and performance.
In-depth technical articles, white papers, videos, webinars, product announcements and more.
In-depth technical articles, white papers, videos, webinars, product announcements and more.
Explore all articles →