Cloud native EDA tools & pre-optimized hardware platforms
The demand for application-specific system-on-chips (SoCs) for compute applications is ever-increasing. Today, the diversity of requirements means there is a need for a rich set of compute solutions in a wide range of process technologies. The resulting products may have very different -- but nonetheless demanding -- power, performance, and area (PPA) requirements. These compute needs extend from IoT wearables to mobile application processors; AI inference engines to machine learning GPUs and NPUs; and at the highest performance end, hyperscale servers and high-performance compute engines in supercomputers to networking and 5G/6G base-stations, from crypto engines to automotive MCUs and advanced driver assistance engines. This diversity leads to a wide range of processing requirements, but almost all have one common objective: extracting the maximum compute performance within the optimum energy profile. The precise engineering trade-offs required to support this multitude of specifications will inevitably be very design specific.
This article will discuss how the SoC design-specific needs for these diverse computing applications, encompassing High-Performance Compute (HPC) and AI, can be addressed for a broad range of processes with a rich, tool-aware Foundation IP solution that includes optimized circuitry, broad operating voltage range support and the flexibility to add customer-specific optimizations. The article will explain how designers can achieve the optimum PPA for their compute applications, whether that goal is the maximum possible performance or the best power-performance trade-off for their designs.
草榴社区 has developed a versatile, highly optimized High-Performance Core (HPC) Design Kit comprising a range of specially architected logic cells and memory cache instances that have been optimized specifically to enable stretched SoC's performance and power goals.
While the diversity of compute applications might share the goal of achieving the best PPA, the environmental conditions and design constraints will vary enormously. To meet the latest density and power requirements, high-performance compute, and mobile application processors will harness the latest process nodes, such as 3nm and even 2nm, using complex implementation techniques like dynamic voltage scaling (DVS). This requires wide-range process voltage temperature (PVT) support and may need custom characterization corners for targeted operating points. Automotive and networking compute applications might target slightly larger geometry Fin-FET nodes, like 16nm, 12nm, 7nm, and 5nm, and they can also take advantage of the 草榴社区 HPC Design Kits to enhance PPA. Crypto engines, graphic processors, and consumer compute engines in 4nm and 6nm shrink processes can also benefit from 草榴社区 HPC Design Kits.
Figure 1 illustrates 草榴社区 HPC Design Kit optimized logic library circuits that can significantly benefit the performance and power envelope.
Figure 1: 草榴社区 HPC Design Kit components for processor PPA optimization
In building the HPC Design Kits, the 草榴社区 Foundation IP Team has carefully selected and tuned the circuit architectures to optimize SoCs for the best PPA. Some HPC Design Kit features of this optimized logic and memory are as follows:
These features culminate in a rich HPC Design Kit that meets the SoC optimization needs of high-performance, medium-performance, and highly power-constrained compute applications. In addition to taking great care with the architectural features, the HPC Design Kit also includes dedicated cell sets to boost performance and reduce dynamic power. These cells can be classified into groups designed to minimize switched capacitance and routing constraints and have complex combinatorial, sequential, and multi-bit cells and cells with optimized timing arcs and delays.
Figure 2 shows where optimized logic circuits within the 草榴社区 HPC Design Kit can be best utilized in computing classes for CPU, GPU, DSP, and CNN application processors.
Figure 2: Key logic components in the 草榴社区 HPC Design Kit
Figure 3: Complex combinational cells reduce area, routing congestion and power
Figure 4: Specialty flip-flops stretch performance and minimize power
The 草榴社区 HPC Design Kit also supports multiple standard cell architectures, with a wide range of VTs and channel lengths, to provide a finer granularity for performance and power scaling. Some of the fastest application processors used in high-performance computing are running at more than 4GHz. High-performance and ultra-high performance library and memory options can be targeted for high-speed CPUs. Lower performance blocks and performance-power balanced processors can use the power saving benefits of high-density and ultra-high density library and memory architectures to give a lower power envelope. Leveraging such a broad and flexible range of options results in the best overall performance-power trade-off. Combining this with the extensive PVT support enables the 草榴社区 HPC Design Kit to provide a very extensive solution space.
Combining frequency modulation with voltage scaling using Dynamic Voltage and Frequency Scaling (DVFS) is a common approach to optimizing performance and power in advanced application processors. To support DVFS, memory instances, and logic libraries must support a wide voltage range. DVFS and Voltage Scaling can enable performance boost modes to maximize frequency by taking advantage of super overdrive and overdrive PVTs for short bursts of performance, while lower PVT clusters are supported to minimize overall power consumption in non-boost modes.
Ultra-low voltage PVTs are supported for applications where power is critical and where the headline performance requirements are more restricted but challenging. Foundation IP that can efficiently scale this wide range is critical and can provide an advantage for reducing power when the core is generally operating at a lower load but still delivering high performance when needed. 草榴社区 Foundation IP supports a very extensive operating voltage range, from near threshold (0.375V) to high voltage (1.15V), providing designers with the flexibility to scale their designs for a broad voltage range and fully take advantage of voltage scaling benefits to reduce dynamic and leakage power.
For HPC processors operating at very high frequencies, the cache memories have stringent access time, setup, and hold time requirements. The area and aspect ratio of the memory also play an important role in defining the block's floor planning. These caches often need to be hand-crafted to provide the best PPA profile. 草榴社区 HPC Design Kit is specifically designed to remove this bottleneck for SoC designers by providing expertly tuned cache instances, optimized beyond what is possible with a compiler.
As the general computation needs increase, so does the need for accelerator blocks designed to perform specific processing tasks. The architectures of these accelerator blocks are highly structured and optimized for the best speed, power, and performance profile for processing a narrower set of specific operations. The architectures are often highly parallelized. AI accelerator blocks are very commonly used in the industry. They are designed and optimized to execute AI algorithms efficiently. These AI algorithms require repetitive MAC operation; hence, the architectures are designed to optimize these MAC operations. Figure 5 shows a typical AI block. Like GPU, AI accelerator blocks are also highly parallelized to maximize the data throughput so the blocks can run at a lower frequency. The overall throughput gain is achieved through thousands of replicated cores operating concurrently. These accelerator blocks are very memory intensive and highly replicated, requiring highly specialized memory instances for the best overall performance. At 草榴社区, we have designed these specialized memories to cater to these applications' growing memory capacity and performance needs.
Figure 5: Memory IP 草榴社区 for AI SoC: Lower Power & Latency
Networks on Chip (NoCs) perform the task of carrying high-intensity communication workloads at the SoC level. NoCs are, therefore, high-performance circuits with a high activity rate, and are very power hungry. They require high-performance 1P, 2P, multi-port, and TCAM memories like those shown in Table 1 below.
Table 1: 草榴社区 Foundation IP for NoC applications
The 草榴社区 Logic Libraries and Embedded Memories are a rich set of IPs, co-optimized with EDA tools, enabling fine-grain SoC optimization and implementation. This allows designers to achieve precise PPA tuning to avoid overdriving and unnecessary capacitance/routing overheads. High drive cells and combinatorial and sequential cells are optimized to minimize internal timing arcs and can be combined with multi-bit cells, which minimize switched capacitance to offer excellent PPA trade-offs. Co-optimizing with the EDA tooling ensures any innovative features are seamlessly accessible to SoC implementers building High Performance Compute, AI, and other processing applications.
EDA view support and PVTs are aligned across 草榴社区 Logic Libraries and Memory Compilers for each node to ensure a trouble-free integration experience. The 草榴社区 Foundation IP is offered across a wide range of foundries and process nodes to achieve optimized PPA regardless of the customer technology choice for the target application. Target customization can be supported to address any specific customer requirements.
Today's SoCs make great demands on implementation teams. They require compute solutions for a wide range of requirements, operating under diverse constraints. Whether it is at the very high-performance compute end of cloud infrastructure, or high-end mobile, low-power AI, or very low-voltage crypto engines, the need to get the maximum performance at a low voltage or within a defined power budget is immensely challenging. As part of the 草榴社区 Foundation IP portfolio, the 草榴社区 HPC Design Kit provides a versatile solution to meet that range of challenges, enabling SoC designers with a comprehensive offering to optimize performance and power across a wide solution space. It addresses needs for the highest CPU clock frequencies and provides optimized power trade-offs for middle- to lower-performance processor applications.
The challenges are not going away, but there is help in the form of the highly optimized logic library cells and embedded cache instances in the 草榴社区 HPC Design Kit.
For more information, visit 草榴社区 Foundation IP
In-depth technical articles, white papers, videos, webinars, product announcements and more.
Read now →