Cloud native EDA tools & pre-optimized hardware platforms
By Ken Brock, Product Marketing Manager, 草榴社区
TSMC recently released its fourth major 28nm process into volume production—28HPC Plus (28HPC+). Millions of production wafers have come out of TSMC’s first two 28nm processes (the poly SiON 28LP and high-K Metal Gate 28HP/28HPL/28HPM). With 28HPC, TSMC had optimized the process for mobile and consumer devices’ need for balance between performance and cost and then developed 28HPC+ to achieve further performance improvement and leakage reduction. Using a combination of these new process technologies and high-quality standard cell logic libraries designed specifically for these processes, designers can achieve their performance, power and area goals while mitigating schedule risk.
This article describes six areas where designers can take advantage of these new processes with the latest logic library technology to optimize the performance, power and area of their SoCs.
The combination of innovative process technology and library design capabilities, along with the latest EDA tool innovations and flows, enable SoC designers to use their design skills to produce the highest performance, lowest cost designs consuming the lowest power.
Logic libraries have traditionally been developed with total corner process/voltage/temperature (PVT) simulation corners to reflect the typical P channel and N channel transistor performance, the statistical slowest performance (slow-slow or SS for 3 sigma), and fastest performance (fast-fast, or FF for 3 sigma). These corners are used to simulate typical expected performance, worst case performance (for flip-flop setup) and best case performance (for flip-flop hold) and include the expected die–to-die, wafer-to-wafer and lot-to-lot variability to assure yield.
Because of reduced process variability, TSMC is able to deliver high-yielding silicon at a new corner called “slow-slow global (SSG),providing a 10-15% performance boost over their previous 28HPM process, which required the more conservative SS signoff (Figure 1). The process variability improvement can enable processors to run 10-15% faster so a 28HPC logic library must be able to support the additional dynamic power and electro-migration requirements for operating circuits at higher speeds.
Figure 1. TSMC 28HPC SSG corner signoff and 28HPM SS corner signoff
The HPC process variability improvements reduce transistor leakage so the 28HPC process will show a ~20% reduction in leakage compared to 28HPM based on different process options and conditions (Figure 2).
Figure 2. TSMC 28HPC FFG corner signoff and 28HPM FFG corner signoff
With 28HPC+, TSMC improved the High K Metal process used on 28HPM and 28HPC with new doping profiles and by shaving a few atoms off of the High K metal gate to achieve a 15% performance improvement and 25% leakage reduction.
Figure 3. The curve on the left shows the performance distribution of 28HPC and the right shows the performance distribution at 28HPC+. Note that these curves compare the same SSG corners
Figure 4. The curve on the right shows the leakage distribution of 28HPC and the left shows the leakage distribution at 28HPC+. Note that these curves compare the same FFG corners
Changes in TSMC design rules driven by process improvements enable logic libraries to be drawn with multiple gate lengths for a greater range than was possible with the TSMC 28HPM process.
At the same time, the new relaxed design rules remove some lithography steps to enable cells drawn with 30nm, 35nm, and 40nm to expand the performance/leakage profile for each process implant variant with a slightly larger gate-to-gate pitch.
Figure 5. The top diagram shows the 140nm pitch and 3 gate lengths of the TSMC 28HPC/HPC+processes as compared to the 28HPM process on the bottom, enabling more space for contacts
This wider range of gate lengths and the associated lithography simplification enable designers to realize greater ranges of performance leakage tradeoffs for their designs using 28HPC-optimized logic libraries and the latest leakage recovery features in synthesis and place-and-route tools.
This wider range of gate lengths and the associated lithography simplification enable designers to realize greater ranges of performance leakage tradeoffs for their designs using 28HPC-optimized logic libraries and the latest leakage recovery features in synthesis and place-and-route tools.
Figure 6: At lower frequencies, a shorter library can be the optimal solution for some blocks
That said, TSMC’s relaxed design rules enable shorter cells to be more routable—providing higher utilization through improved pin access, if the logic library provider crafts the standard cell layouts to take advantage of the latest features designed into the place and route tools.
Combining the benefits of the TSMC 28HPC/HPC+ processes with innovative logic library design and optimized layout provides several advantages to design engineers creating blocks of digital logic from RTL through synthesis and place and route. Using optimized logic library circuits, such as combinational cells, multi-setup/multi-delay flops and multi-bit flip-flops (MBFF), with the TSMC 28HPC/HPC+ processes, provides both area and performance benefits.
Optimizing register-to-register paths requires a rich standard cell library that includes the appropriate functions, drive strengths, and implementation variants. Optimized functions (NAND, NOR, AND, OR, Inverter, buffers, XOR, XNOR, MUX, adders, compressors, etc.) are necessary for synthesis to create optimal designs and optimized layout techniques are needed to get the most out of the latest routing algorithms to eliminate congestion. Advanced synthesis and place-and-route tools can take advantage of varied drive strengths to optimally handle the different fanouts and loads created by the design topology and physical distances between cells.
The setup plus the delay time of flip-flops is sometimes referred to as the “dead” or “black hole” time. Like clock uncertainty, this time eats into every clock cycle that could otherwise be doing useful computational work. Multiple sets of high-performance flip-flops are required to optimally manage this dead time. Delay-optimized flops (multi-delay flops) rapidly launch signals into critical path logic clusters, while setup-optimized flops (multi-setup flops) capture registers to extend the available clock cycle in several increments. Synthesis and routing optimization tools can be effectively constrained to use these multi-setup/multi-delay flip-flop sets for maximum speed, resulting in a 15-20% performance improvement.
Figure 7: Sequential cells are used to resolve high-performance core design challenges. Multiple flop variants enable targeted optimization
Figure 8: Combining two single-bit flops into a dual flop with shared clocking
Multi-bit flip-flops provide a set of additional flops that have been optimized for power and area with a minor tradeoff in performance and placement flexibility. The flops share a common clock pin, which decreases the overall clock loading of the N flops in the multi-bit flop cell, reduces area with a corresponding reduction in leakage, and reduces dynamic power on the clock tree significantly (up to 50% for a dual flop, more for quad or octal).
Multi-bit flip-flops are typically used in blocks that are not in the critical path of the highest chip operating frequency. They range from small, bus-oriented registers of SoC configuration data that are only clocked at power up, to major datapaths that are clocked every cycle and with a number of variants in between. SoC designers use the replacement ratio, measured by how many of the standard flops in the design can be replaced by their multi-bit equivalents and the resulting PPA improvements, to determine their overall chip power and area savings. The single-bit flip-flops to be replaced with multi-bit flip-flops must have the same function (clock edge, set/reset, and scan configuration).
Figure 7 shows a 32-bit processor being synthesized with a logic library for TSMC 28HPM (blue line) and again with the same library characterized to the TSMC 28HPC process (orange line), where you can see greater performance in less area. Including innovative cells such as those in the 草榴社区 High Performance Core Design Kit enables SoC designers to achieve smaller area for a given frequency and a higher top end frequency as seen in the dotted red and blue lines. Using the 28HPC+ process shifts this curve 15% to the right.
Figure 9: Comparing the 28HPM process with the 28HPC process using 草榴社区 logic libraries and adding the 草榴社区 HPC Design Kit libraries to harden a 32-bit processor by sweeping timing constraints for a synthesized block until the library can no longer close timing
Logic libraries for the TSMC 28HPC/HPC+ processes must be designed to be synthesized, placed, routed, validated and optimized by digital EDA tools for timing, power and design rule compliance through design flows integrated with synthesis, place and route, design rules constraints and other tools. Digital EDA tools and flows enable designers to take full advantage of the circuit innovations such as multi-bit flops and the compact layouts designed into the most efficient logic libraries.
TSMC’s 28HPC High K Metal Gate process offer improvements in process rules and variability to enable smaller designs, at higher performances, using less power. TSMC’s new 28HPC+ process takes this improvement one step further and provides a hard-to-resist platform. Leading synthesis and place and route tools can best take advantage of these process improvements to meet demanding design specifications if they have the right set of logic libraries that take full advantage of these new process capabilities. The 草榴社区 DesignWare logic libraries for TSMC 28HPC/HPC+ and leading EDA tools are designed to enable SoC designers to push the limits of performance, area and power and fully utilize the capabilities of these new processes.
For more information visit: /dw/ipdir.php?ds=hpc-design-kit