草榴社区

Artificial Intelligence (AI) has become pervasive in recent years and has rapidly established itself as a transformative technology. AI is powered by machine learning (ML) algorithms, which require massive computational power. Designers have traditionally relied on graphics processing units (GPUs) to execute these ML algorithms. Originally developed for graphics rendering, GPUs have proven well suited for performing the matrix and vector operations essential to machine learning. However, the AI hardware landscape is undergoing dramatic changes. The increasing complexity of computational requirements and the need for improved energy efficiency are driving the emergence of startups specializing in domain-specific AI processors. These startups are developing specialized AI processors with architectures optimized for ML algorithms, delivering significantly improved performance per watt compared to general-purpose GPUs. 

As AI technology continues to advance, the demand for greater computational power and energy efficiency will continue to increase. According to an analysis by Semianalysis (source: ), AI data center power needs are projected to surpass the non-AI data center power needs by 2028, accounting for more than half of global data center power consumption, compared to less than 20% today (Figure 1). 

Figure 1: Power need trends for AI data centers and non-AI data centers

The data center industry is attempting to alleviate the power demand by moving away from traditional air-cooled systems and turning to more expensive but highly effective liquid cooling solutions. However, relying solely on advancements in external cooling is not enough. To manage these increasing power demands, AI hardware developers must also innovate within the system design itself, exploring more comprehensive avenues for power optimization.  

How 草榴社区 Foundation IP Enable Low Power Development

While developing system-on-chips (SoCs), designers can perform power optimization at various stages of the design, including at the architecture level, the implementation level, and the underlying technology level. 草榴社区 Foundation IP can help designers address these target areas (Figure 2). Power dissipation on an SoC is mainly attributed to dynamic power from circuit switching and leakage (or static) power. Dynamic power is dissipated when the processors are executing instruction workloads and is proportional to CV^2f, where C is switching capacitance, V is operating voltage, and f is the clock frequency of the circuit. Leakage power is dissipated both when the processor is idle, or active and scales with threshold voltage, transistor size and temperature. Various power management techniques, such as power gating and dynamic voltage and frequency scaling (DVFS), are used at the architectural level to reduce total power. At the implementation and process technology levels, design optimization and carefully managing the operating conditions of logic cells and embedded memories directly impact power consumption. Enabling logic cells and memories to operate at the lowest possible voltage whilst still maintaining the required performance, along with minimizing the capacitance on active nodes by using specialized cells, can significantly contribute to power savings.

Leveraging a raft of experience and deep capability built over multiple generations of Foundation IP optimization, 草榴社区 can play a crucial role in enabling power optimization for AI SoCs. The advanced solutions offered by 草榴社区 Foundation IPs include highly optimized, silicon-proven Logic Libraries, General Purpose IOs (GPIOs), and Embedded Memories. With the richest cell set in the industry, 草榴社区 Logic Libraries and IOs are co-optimized with 草榴社区 electronic design automation (EDA) tools to fully exploit the process technology benefits and deliver optimum power, performance and area (PPA) trade-offs. 草榴社区 memories incorporate key ML algorithm-specific features that translate to significant area and power savings for AI chips.

Figure 2: End-to-end energy efficient design flow

Let us dive deeper into how 草榴社区 Foundation IP helps in the reduction of power dissipation, specifically for AI processors. 

  • Specialized logic cells which can be pitch-matched to 草榴社区 memories: In AI processors, for both training and inference tasks, a huge portion of compute activity (70-90% or more) is dedicated to multiply-accumulate (MAC) operations which are fundamental to matrix multiplication and convolutions. The logic library offering from 草榴社区 comprises specialized, complex logic for AI processors that includes support for MAC functionality. These cells include features such as fused multiply-add capabilities, which help minimize the net length and overall capacitance in the design, leading to a substantial reduction in dynamic power consumption. Equally important for AI chips is the integration of power-efficient memories. In ML models, especially for inference tasks, parameter weights are stored in memory and frequently accessed by MAC units for computations (Figure 3). 草榴社区 provides embedded memories that are pitch-matched to the MAC units. This means that the physical layout of memory and logic cells is aligned in such a way that their dimensions and spacings are co-optimized. This integrated design strategy results in shorter interconnects, which have been shown to provide 33% power reduction in certain applications. 

Figure 3: (a) MAC unit block diagram (b) Memory read and write for a MAC unit                                       (source: https://iopscience.iop.org/article/10.1088/1674-4926/42/1/013104)

  • Customizable ultra-low voltage libraries: Designing chips to operate at ultra-low supply voltages in advanced technologies, particularly at below 0.5V, is extremely challenging and requires very diligent design and verification. However, the power benefits of using low supply voltage can be enormous since reducing the voltage has a quadratic reduction effect on dynamic power consumption. AI processors, which typically rely on huge parallelism for performance, do not need to run at high frequencies. They can particularly benefit from using an ultra-low voltage library. 草榴社区 enables low-power chip designs with its customizable ultra-low voltage logic libraries. The libraries are underpinned by high quality and exhaustive verification, with advanced characterization techniques employed across a broad range of process, voltage, and temperature (PVT) conditions.  Challenges with lower voltages include reduced noise margins and increased sensitivity to manufacturing variations. With low supply voltage, the ability of a signal to change the state of the next stage in a circuit is weakened. This can lead to signals behaving more like pulses, taking longer to propagate through the circuit. This delay can affect critical timing aspects of the circuit, including the setup and hold times. To account for this, designers should consider additional factors such as rail-to-rail pulse checks, extra timing margins for on-chip variation (OCV), high sigma requirements for hold timing, and clock skew recommendations. 草榴社区 Foundation IP designers develop their cells with these variabilities in mind. The cells undergo high sigma Monte Carlo simulations for robustness verification, and the careful use of moment-based library variation format (LVF) allows for precise and detailed modeling of the probabilistic nature of manufacturing variations (Figure 4).

Figure 4: The increasing complexity of on-chip variation for low supply voltage

  • Logic cells with fractional drive strengths: Logic cells with higher drive strengths consume more power and tend to have higher leakage, due to their larger transistors. For non-critical paths, which have already been optimized for power by using high voltage threshold (VT) cells, further power reduction can be achieved by using cells with fractional drive strengths. 草榴社区 logic library portfolio includes a range of these fractional drive strength cells, including cells with drive strengths less than one. 
  • Power optimization kits: To enhance power savings, 草榴社区 offers a Power Optimization Kit (POK) as part of its standard cell platform. The kit includes a variety of specialized logic cells designed to implement advanced power management techniques. This includes power switches and isolation cells that help in reducing static power consumption by enabling block shut-down when not in use. The kit also includes level shifters, which assist in dynamic power reduction by allowing different blocks to operate at different voltages, depending on their performance requirements. Additionally, the POK features multi-bit variants of isolation cells, retention flops and level shifters, which help in reducing net lengths and overall cell area.   
  • Ultra-low leakage IOs: In SoCs which have AI chips, while on-chip components operate at low voltages, these need to be connected to off-chip components that operate at much higher voltages. Designing GPIOs that support such a range in voltage is extremely challenging, and most companies resort to using level shifters, adding unnecessary area and power to the design. 草榴社区 offers a comprehensive set of ultra-low leakage IOs, supporting voltages as low as 0.5V. These same IOs also support a 1.8V IO supply, enhancing overall system reliability. SoCs with AI chips are also larger in size, requiring stringent electrostatic discharge (ESD) protection standards. 草榴社区 provides IO solutions that include strong ESD protection, capable of handling currents up to CDM 7A. This translates to more efficient, reliable, and cost-effective AI SoC designs.
  • Non-volatile memories and latch-based memories: 草榴社区 offers a broad portfolio of advanced memory solutions, including embedded magnetoresistive random access memories (MRAMs) and resistive random-access memories (RRAMs), which provide significantly higher densities than traditional SRAMs. For read-dominated applications such as storage of training data, replacing SRAM or off-chip DRAM with MRAM or RRAM can significantly improve system-level PPA. These non-volatile memories (NVMs) reduce silicon area and the number of components required. Additionally, because they do not need constant power to maintain their data state—unlike DRAMs— they eliminate the need for frequent refresh cycles, thus lowering static power consumption and reducing leakage currents. 草榴社区 also offers latch-based memories, which save significant area for smaller memory instances. These are particularly useful for specific AI functions like activation and pooling, which require many small memory instances. In addition, 草榴社区 provides specialized multi-port memories that handle multiple memory access requests simultaneously, helping to alleviate memory bottlenecks and improve overall performance.  
  • Sparsity and transposition support in memories: In many ML models, a substantial portion of the data to be computed consists of words of zero values which can be skipped during read/write operations to save power. To leverage this data sparsity, 草榴社区 has introduced an innovative feature called WAZ (Word All Zero) in its memories. This feature can reduce power consumption by up to 60%, by detecting and skipping over zero values. Additionally, 草榴社区 has developed a method to store data in memory in a transposed format. This means that matrix elements are aligned in memory to match their access patterns during computations. As a result, matrix operations are executed more quickly, saving energy, and improving overall efficiency.  

Summary

As application requirements and AI technology evolve, the demand for developing computationally powerful and energy-efficient AI processors is pervasive. Both traditional GPU-based architectures and some of the evolving optimized AI architectures are driving the power efficiency curve to the limits. Traditional library and memory offerings optimized for enabling CPUs and previous generations of GPUs can fall short of meeting the specialized needs of today’s demanding AI SoC designs. As the leader in Foundation IP, 草榴社区 has been innovating for optimal PPA for more than 20 years, consistently delivering specialized solutions to meet the demanding and changing design needs of the semiconductor industry. Supported by a robust R&D team and skilled application engineers, 草榴社区 leverages its expertise in logic libraries, IOs, and embedded memories to deliver uniquely tunable solutions to enhance the full spectrum of AI chip capabilities.

草榴社区 IP Technical Bulletin

In-depth technical articles, white papers, videos, webinars, product announcements and more.

Continue Reading