草榴社区

Selecting Memory Architectures for AI SoCs

Jamil Kawa, 草榴社区 Fellow, 草榴社区

Introduction

The pace of deep machine learning (ML) and artificial intelligence (AI) is changing the world of computing at all levels of hardware architecture, software, chip manufacturing, and system packaging. Two major developments have opened the doors to implementing new techniques in machine learning. First, vast amounts of data, i.e., “Big Data,” are available for systems to process. Second, advanced GPU architectures now support distributed computing parallelization. With these two developments, designers can take advantage of new techniques that rely on intensive computing and massive amounts of distributed memory to offer new, powerful compute capabilities.

Neuromorphic computing-based machine learning utilizes techniques of spiking neural networks (SNNs), deep neural networks (DNNs) and restricted Boltzmann machines (RBM). Combined with Big Data, “Big Compute” is utilizing statistically-based high-dimensional computing (HDC) that operates on patterns, supporting reasoning built on associative memory and on continuous learning to mimic human memory learning and retention sequences.

Emerging memories range from compute-in-memory SRAMs, STT-MRAMs, SOT-MRAMs, ReRAMs, CB-RAMs, and PCMs. The development of each type is simultaneously trying to enable a transformation in the computation for AI. Together, they are advancing the scale of computational capabilities, energy efficiency, density, and cost.

Nine Challenges of Selecting Memory Architectures for ML/AI Computing

Several challenges face system designers in choosing the optimal computing architecture and the associated combination of memories supporting their objectives for an ML/AI application. Although designers utilize traditional embedded SRAM, caches, and register files today, no generic nor exotic memory solution can satisfy the newly required AI loads in development. However, as machine learning is projected to consume a majority of the energy consumed, optimizing memories for machine learning helps designers hit their power budgets. This has major implications for system design.

Designers balance the requirements of their designs as they determine which of the nine major challenges are most critical at a given time:

  1. Throughput as a function of energy (peta-ops per watt)
  2. Modularity and scalability for design reuse
  3. Thermal management to lower costs, complexity, and size
  4. Speed supporting real-time AI-based decision making
  5. Reliability especially for human life sensitive applications
  6. Processing compatibility with CMOS for components constituting a system. As an example, STT-MRAM can be easily integrated with a CMOS based processor
  7. Power delivery
  8. Cost; best expressed in the “sweet spot” node for a function and with integration (packaging) cost
  9. Exhibiting analog behavior mimicking human neurons

Each of these memory challenges can be addressed in multiple ways, as usually there is more than one alternative for the same objective. Each alternative will have pros and cons, including further scalability implications for architectural decisions.

For example, designers must choose between using SRAMs or a ReRAM array for compute-in-memory. The power and scalability implications of these two options are at extreme opposites. The SRAM option is the right choice when the size of the memory block is relatively small, the required speed of execution is high, and the integration of the in-memory compute within a system-on-chip (SoC) comes naturally as the most logical option (although SRAM is costly in area and in power consumption – both dynamic and leakage). On the other hand, a highly parallelized matrix multiplication typical of deep neural networks requiring a huge amount of memory makes the argument for using ReRAM, because of the density advantages.

Multi-port SRAMs play a special and unique role in compute-in-memory architectures because Boolean logic functions are operations involving multi-inputs and require the ability to simultaneously read data from multiple addressable locations and write the results back in desired memory locations. Multi-port SRAMs and Register files offer that precise flexibility. Also, multi-port SRAMs can be used to construct register files for GPUs crucial for efficient multi-threading.

Understanding the Range of Emerging Memories

The most prominent emerging memories are STT-MRAM, SOT-MRAM, ReRAM, CB-RAM, FeRAM, and PCM. Rather than detailing the make-up of each particular memory, understanding the main features that making them major candidates for neuromorphic computing architectures (as well as candidates for universal memory) is helpful when selecting memories. You can find detail on each of these memories in the 草榴社区 white paper, “Neuromorphic Computing Drives the Landscape of Emerging Memories for Artificial Intelligence SoCs.”

Table 1 summarizes the comparison between the emerging memories often considered for neuromorphic computing utilized as off-chip solutions and compared to traditional on-chip SRAM and register files. It reflects the latest (as of early 2020), most representative numbers for each technology and it is simply sampled from one of many in most cases. 

Table 1: Comparison between emerging memories for neuromorphic computing shows that no single memory type can be the “perfect” memory for all AI chips, but each has their advantages

草榴社区 Addressing Memories for Neuromorphic Computing

The list of memories involved in neuromorphic computing is not complete without addressing classical SRAM memories. SRAMs and register files remain the backbone of AI/ML architectures for neuromorphic computing with their unmatched latency in all memory categories. However, the overriding theme of maximizing the TOPS/W metric for AI applications and for Von-Neumann architectures dictates the use of parallelism that can be accomplished with multi-port memories with utmost configuration flexibility to accommodate compute-in-memory and near memory computing. 草榴社区 actively supports research in compute-in-memory while supporting near-memory computing as the most energy-efficient yet a versatile form of computing. 

Summary

The era of Big Data and Big Compute is here. Per OpenAI, compute demand by deep learning has been doubling every three months for the last 8 years. Neuromorphic computing with deep neural networks is driving AI growth; however, it is heavily dependent on compact, non-volatile energy-efficient memories with various attractive features to suit different situations. These include STT-MRAMs, SOT-MRAMs, ReRAMs, CB-RAMs, and PCMs. Neuromorphic computing relies on new architectures, new memory technologies, and more efficient than current processing architectures, and it requires compute-in-memory and near memory computing as well as the expertise in memory yield, test, reliability and implementation.