Cloud native EDA tools & pre-optimized hardware platforms
From the dawn of civilization through 2003, roughly five exabytes of data were created in total, according . By 2025, is expected to reach 180 zettabytes. This means that within the span of a single generation, we've created roughly 36,000 times the amount of data ever created—that's a lot of data! To accommodate this data explosion, the installed base of storage capacity is expected to increase at 19.2% CAGR through 2025, and the data center accelerator market is expected to grow by .
It doesn't stop there.
Managing data—created, copied, stored, consumed, and otherwise proliferated from the data center to the edge—creates unique challenges for SoC designers. This includes mounting pressure to move the data through systems faster and with greater efficiency and security: Lower power. Smaller area. Lower latency. And with data confidentiality and integrity. It's essential for the interconnects in multi-die systems to have low latency along with enough flexibility to manage a variety of bandwidths and throughput. Complying with the right industry standards can help ensure design success.
One of the newer kids on the standards block—and quickly gaining traction—is , an open interface specification with its own consortium for processors, accelerators, and memory expansion. Read on to learn more about the CXL standard and when you might want to consider CXL for improving latency in your next SoC design.
The CXL standard is made up of three protocols that negotiate up within one link that leverages the PCI Express? (PCIe?) electrical layer, including CXL.io, CXL.cache, and CXL.memory (CXL.mem). Each of these CXL protocols have separate stacks, multiplexed at the PHY level, ideal for various contexts:
Having low-latency interfaces such as CXL unlocks new ways of doing computing such as enabling efficient heterogeneous computing architectures, accelerating data-intensive workloads, and facilitating advanced real-time analytics. CXL's computational offloading and memory pooling coupled with the ability to interoperate with the ubiquitous PCIe standard opens up a wide array of design possibilities. It extends the new paradigm of disaggregation and composability in multi-die systems to include cache and memory.
If a processor offers a CXL interface, accelerators can have access to the same data as the processor, avoiding the need to replicate data across the system.
Here's an example of how this helps the efficiency of your system:
Imagine you are designing a security camera application. There's a physical camera, and it dumps frames of data into system memory maybe 30-, 60-, 100-frames per second, or more. The processor takes those frames of data in the memory, and it recognizes a face, and another face, and another. The processor needs to parse out which face is Ted, which is Michael, and which is Sophia.
In the past, there was a lot of back and forth of the control and the copying of data to do this kind of operation. The CPU would have to tell the driver to copy the frames of data from memory and deliver it to the accelerator through the system bus. After the data was delivered to a memory buffer in the accelerator, the accelerator would analyze the data to determine who those faces were. All that data would then have to travel back through the system to the CPU that would write the names associated with the faces into the memory.
With CXL, instead of the driver copying the face data over to a buffer on the accelerator through the system bus, the accelerator has direct memory access. This means that the CPU can simply send pointers to the accelerator that say (for instance), "look at the addresses 1,000,000, 1,100,000, and 1,200,000 in the memory. Those are faces. Let me know who those face are." The accelerator can update the system memory directly, defining the faces as Ted, Michael, and Sophia without sending the data back and forth through the system.
With CXL, data only gets moved as the co-processor needs it, and even then, when it accesses a face, it does not copy all the data across the system bus, it only copies the information that is absolutely necessary—not the entire frame. This equates to less software overhead and latency, freeing your system up for better die-to-die communications.
With today's demands in multi-die systems, the number of processor cores are rapidly growing. And greater numbers of processor cores equate to greater amounts of required memory. At the same time, you may not need all of the one-size fits-all memory allocation for the cores. CXL protocol solves for this by allowing memory expansion with coherence, meaning that processor cores can share memory resources in a way that increases efficiency.
Prior to CXL, designers had to make non-volatile memory interfaces look like DRAM so that the memory would persist even if the power was cut. While the methods to manipulate products to look like DRAM might work great for dynamic memory applications, they are anything but streamlined for non-volatile memory. It's like trying to hammer a square peg into a round hole. A few companies even went so far as to build products that look like DRAM and physically plug into a DIMM socket on your server. That way, the DRAM could be copied off to NAND Flash or MRAMs when the power failed, or the DRAM could otherwise interface to more complex technology. To do all of this effectively, you had to be really clever and creative, and then? Well, you needed to hope for the best.
Enter CXL (because hope really isn't a great strategy). While DRAM solutions work well when in close physical proximity to your processor, when you’re running hundreds of bits across even a few inches of PC board, you run into all kinds of skew problems. CXL.mem is better suited for moving data over longer distances.
In addition, the CXL.mem protocol part of the specification enables you to:
Imagine if your system could talk to your disk drive like it were a memory and you didn’t have to worry about sectors, heads, and tracks. What if your system could get 3 bytes of memory from a device in one place and 100 bytes from a device in a different place? In short, what if memory was pooled as a common resource?
In the past if you rented 100 processor cores from a server farm you probably also rented 800G (or some other fixed amount) of memory. But, maybe you didn't need all that memory and so some of that memory was stranded (never used). CXL mitigates this with memory pooling so you can converge on perfect memory utilization, reducing time and latency.
With CXL, you can create virtual machines that have the right mixture of memory, processing, and acceleration for your specific job. 草榴社区 offers a comprehensive solution for implementing CXL to help you get started. including our CXL PHY and controllers with IDE security and verification IP. 草榴社区 CXL controllers behave like a super set of a PCIe controller, leveraging the speed of PCIe along with the PHY. We also have hardware help in the form of IP prototyping kits.
草榴社区 Verification IP and protocol verification solutions for CXL (up to 3.0) on 草榴社区 ZeBu? and HAPS? hardware-assisted platforms provide IP to system-level methodology to verify CXL bus latency and identify system bottlenecks in compliance with Chapter 13 of the CXL specification.
As part of our complete solution, our experts can help you make the right decisions for your subsystems. This is useful when you are building very complex SoCs—for example 20 different combinations for bifurcation cases, (16 lane, 2×8 lane, 8×2 lane, 4×4 lane, etc.) Having a number of combinations requires instantiating a lot of different controllers and handling the various clock and reset logic carefully—potentially, even integrating the PHYs into a single subsystem. We have deep experience helping customers on the leading edge of adoption and beyond, and we use this background to help you.
Not only can we help ease your design journey and lower your risk, but we've also got you covered with interface security. We have solutions for Integrity Data Encryption (IDE) for both PCIe and CXL, for confidentiality, integrity and replay protection, including support for TEE Device Interface Secure Protocol (TDISP) for virtualized environments, and more. Our IDE security module gives you a complete, fully integrated, and configurable solution with very low latency overhead.
To learn more, check out how XConn achieved first-time silicon success for CXL switch SoC with 草榴社区 CXL and PCIe IP products or download our 草榴社区 IDE Secure Module for CXL 2.0 datasheet.