草榴社区

The Importance of Memory Architecture for AI SoCs

Jamil Kawa

Dec 07, 2023 / 3 min read

The rapid advance of artificial intelligence (AI) is impacting everything from how we drive to how we make business decisions and shop. Enabled by the massive and growing volume of big data, AI is also causing compute demand to balloon. In fact, the most recent generative AI models to train models compared to the previous generation, which is, in turn, doubling overall demand about every six months. 

As you might expect, this has led to a computing transformation that has, in part, been made possible due to new types of memory architectures. These advanced graphics processing unit (GPU) architectures are opening up dramatic new possibilities for designers. The key is choosing the right memory architecture for the task at hand and the right memory to deploy for that architecture. 

To be sure, there is an array of more efficient emerging memories out there for specific tasks. They include compute-in-memory SRAM (CIM), STT-MRAM, SOT-MRAM, ReRAM, CB-RAM, and PCM. While each has different properties, as a collective unit they serve to enhance compute power while raising energy efficiency and reducing cost. These are key factors that must be considered to develop economical and sustainable AI SoCs. 

Many considerations affect a designer’s choice of architecture according to the priorities of any given application. These include throughput, modularity and scalability, thermal management, speed, reliability, processing compatibility with CMOS, power delivery, cost, and the need for analog behavior that mimics human neurons. 

Let’s examine the features of the assorted emerging memories currently at a designer’s disposal.

1. SRAM and ReRAM: A choice between two extremes for compute-in-memory

For efficient compute-in-memory chips, designers must opt for either SRAM or ReRAM, which are opposites in terms of power and scalability. SRAM is recommended for small memory blocks and high-speed requirements but comes associated with drawbacks such as its substantial area and increased power consumption. For projects with significant memory requirements, ReRAM offers an advantage owing to its density. ReRAM also offers the additional benefit of analog behavior when needed. To be clear, most (if not all) emerging memories can be deployed for compute-in-memory, but SRAM is the best choice for performance and ReRAM is the best choice for density and energy efficiency.

2. MRAM: Low-power revolution

MRAM — specifically STT-MRAM and SOT-MRAM — is a non-volatile, ultra-low-power memory option fully compatible with CMOS processing. Where MRAM was traditionally slow and difficult to scale, STT-MRAM changed the game with a variable current inducing a “spin orbit torque.” STT-MRAM offers fast write times as low as 1ns and is widely used in IoT applications. SOT-MRAM is a variation that allows for even faster read and write times with identical functionality (though that comes at the price of a larger area). 

Shared features include low leakage, scalability, high retention time, and high durability, as well as ease of integration with CMOS.

3. PCM, ReRAM, CB-RAM: Cost-effective storage and connectivity

These memory types fall into the category of phase-change memories. They are non-volatile with two distinct states of low and high resistance based on the direction of the current applied between the two electrodes forming the memory. Non-volatile memories such as PCM, ReRAM, and CB-RAM are ideal for storing analog data such as video, audio, and images, negating the need for expensive digital-to-analog converters. A further benefit is in training neural networks, as phase-change memories enforce the connectivity of a synapse. All are easily integrated with CMOS and in 3D stacking. ReRAM cross-bar arrays are best for in-memory computing. And CB-RAM is ideal for realizing in-memory computing and in the implementation of neural networks.

While no single memory type is the silver bullet for all AI chips, each has its advantages in terms of the space it takes up, capacity, retention, cost, ability to stack, endurance, and more. Each memory challenge can be addressed in multiple ways, with more than one suitable alternative that can be considered to meet the same objectives. Designers need to weigh both pros and cons for each alternative, including further scalability implications for architectural decisions.

Neuromorphic Computing vs. Traditional SRAM

The emerging non-volatile memories discussed above represent the foundation of neuromorphic computing and lend themselves to non-Von Neumann architectures. While they offer exciting potential, classical SRAM memories remain important. 

With unequaled latency, SRAM is still the backbone of AI and machine learning (ML) architectures for neuromorphic computing. Multi-port memories enable parallelism to accommodate CIM and near-memory computing. 草榴社区 supports CIM research and sees near-memory computing as the most energy-efficient, versatile form of computing. We offer a family of multi-port SRAMs that are energy efficient and can operate at ultra-low voltages. 

Neuromorphic computing is driving AI growth and fueling demand for more computational power through its reliance on new architectures and new memory technologies that are more efficient than current processing architectures. In particular, compute-in-memory and near-memory computing are vital to this new AI age, as are expertise in memory yield, test, reliability, and implementation. It is not an exaggeration to say that the ecosystem depends on compact, non-volatile, energy-efficient memories. Designers can and must take advantage of them to parse the massive volume of data and distributed memory and deliver the compute capabilities necessary to power our “smart everything” future.

For more details on this topic, read our technical bulletin, "Selecting Memory Architectures for AI SoCs."

Continue Reading