Cloud native EDA tools & pre-optimized hardware platforms
By: Ken Brock, Product Marketing Manager, 草榴社区
TSMC recently released its fourth major 16nm process into volume production—16FFC (16nm FinFET Compact). This process provides an easy migration from 28nm processes along with significant performance, power and area advantages. To develop the most competitive system-on-chips (SoCs) in this process, designers must choose optimized foundation IP building blocks (embedded memories and standard cell libraries) to achieve the highest SoC performance with lowest power and area. With the combination of the 16FFC process and the right foundation IP, designers can develop SoCs for applications from high-end green servers and network processors to ultra-low power mobile devices, consumer products, and wearables-and everything in-between.
This article describes seven ways designers can take advantage of this new process with the most advanced logic library and memory compiler technology to optimize the performance, power and area of their SoCs.
Process Scaling
As part of Moore’s Law and classic Dennard scaling, the 16FFC process offers a smaller transistor pitch (contacted poly pitch or CPP), smaller interconnect metal pitch (wire to wire, via to wire and via to via) for routing and a smaller bitcells that provide a basic area reduction. Optimized IP layout innovations can take advantage of these smaller design rules while addressing challenges of 16nm that include higher wire resistance due to thin wires, and associated electro-migration concerns for signal wires and for the power grid. These must be addressed both in IP architecture and IP validation. As seen in Figure 1, with optimized foundation IP, 16FFC provides greater than two times the area benefits and greater than 30% performance improvements as compared to 28nm.
Figure 1: Area vs. Performance – 28nm vs. 16nm for CPU
FinFETs provide higher saturation currents per unit area which can be turned into improved performance through different circuit topologies that enables the use of shorter logic cells to close critical timing paths.
Reduced Gate Leakage but Increased Dynamic Power
16FFC offers a rich palette of voltage threshold (VT) and channel lengths to cover a broad performance/leakage spectrum. Figure 2 shows a plot of logic gate performance vs leakage (log scale) that shows the design tradeoffs that can be achieved using footprint compatible standard cells at multiple VT/channel lengths.
Figure 2: Relative performance vs relative leakage per VT and channel length, 7.5 track (T) Ultra High Density
Many mobile and Internet-of-Things (IoT) devices spend most of their time in standby or sleep state where the only power dissipated is in leakage. A major advantage of FinFETs is that they are functional at much lower voltages, with, of course, an associated drop in performance. Leakage is roughly proportional to the supply voltage but these savings can be considerable at low voltages.
Total power is comprised of dynamic power and leakage power. FinFETs have much less leakage as compared to 28nm or other nodes but do consume relatively higher dynamic power due to increased input capacitance of the fins and higher saturation currents. This change in relative leakage vs dynamic power can make large differences in design paradigms taken from 28nm SoCs. Figure 3 shows leakage power as a percentage of total SoC power from 180nm to 16nm. This takes much of the pressure off of designers from reducing leakage but puts more focus on reducing dynamic power at 16FFC.
Figure 3: Leakage as a percentage of total SoC power from 180nm to 16nm
Managing Dynamic Power = CFV2
Since SoC performance is mandated by the application spec, designer-controllable sources of dynamic power include managing switching frequencies through use of aggressive clock gating, minimizing capacitance and minimizing operating voltages. Wiring capacitance is minimized with dense, optimized layouts and shorter wiring runs. (Input capacitance can be minimized by using libraries optimized with the best cell heights for a given function at a given frequency). Standard cells can be built in multiple heights (3 fin, 4 fin and 5 fin) to match the target frequency of the block in both performance and reliability. Figure 4 shows the input capacitance of 1X drive inverters at the three different track heights (7.5T, 9T, 10.5T). Other cells would show similar trends.
Figure 4: Input capacitance of 1X inverter per standard cell architecture
Depending on the block function and frequency, using the Ultra High Density (UHD) 7.5 track library for a block will not have the highest performance compared to the High Density (HD) 9 track library for the same block, but will consume ~25% less power due to reduced device capacitance. In addition, dynamic power can be reduced by a factor of V2 when lowering the block voltage. Figure 5 plots leakage power (dotted line) and dynamic power (solid line) of comparable blocks at different nominal voltages. The reduction in dynamic power at low voltages is due to the V2 component.
Figure 5: Performance vs leakage and dynamic power at multiple nominal voltages
Logic Library Design for Significant Block PPA Improvements
Combining the benefits of the new TSMC 16FFC processes with optimized layout and innovative logic library circuit design provides design engineers who create blocks of digital logic from RTL through synthesis and place and route with several advantages. Routed block density is critical to saving both die area and saving power.
Efficient Layout for Minimum SoC Area and Minimum Total Power
Standard cell design is a complex process in which each circuit element, layout feature or tradeoff can have a major impact on a combination of performance, power, area (PPA) and manufacturability. Taking full advantage of process features such as continuous poly on diffusion edge (CPODE) enable routed blocks to be 5% smaller than a design using only poly on diffusion edge (PODE), for both minimum routed block area and minimum total power.
Combinational Cells
Optimizing register-to-register paths requires a rich standard cell library that includes the appropriate functions, drive strengths, and implementation variants. A rich set of optimized functions (NAND, NOR, AND, OR, inverter, buffers, XOR, XNOR, MUX, adders, compressors, etc.) are necessary for synthesis to create optimal circuits and optimized layout techniques are needed to get the most out of the latest routing algorithms to eliminate congestions. Advanced synthesis and place-and-route tools can take advantage of a rich set of drive strengths to optimally handle the different fan-outs and loads created by the design topology and physical distances between cells.
Sequential Cells
The setup plus the delay time of flip-flops is sometimes referred to as the “dead” or “black hole” time. Like clock uncertainty, this time eats into every clock cycle that could otherwise be doing useful computational work. Multiple sets of high-performance flip-flops are required to optimally manage this dead time. Delay-optimized flops (multi-delay flops) rapidly launch signals into critical path logic clusters and setup-optimized flops (multi-setup flops) capture registers to extend the available clock cycle in several increments. Synthesis and routing optimization tools can be effectively constrained to use these multi-setup/multi-delay flip-flop sets for maximum speed, resulting in a 15-20% performance improvement.
Memory Compiler Design for Significant PPA Improvements
Optimized for low power, high performance and high density, DesignWare? Memory Compilers offer advanced power management features such as light sleep, deep sleep, shut down and dual power rails, write assist, allowing designers to meet the stringent low-power requirements of today's SoCs. DesignWare Memory Compilers are closely coupled with the DesignWare STAR Memory System?, providing an integrated embedded memory test solution to detect and repair manufacturing faults for the highest possible yield with least impact on chip area. DesignWare Memory Compilers are silicon-proven with billions of chips shipping in volume, enabling designers to reduce risk and speed time-to-market.
Figure 6: Broad portfolio of DesignWare Memory Compilers for a variety of applications
Summary
TSMC’s 16FFC process offers improvements in process rules and variability to enable smaller designs at higher performances, using less power. Leading synthesis and place and route tools can best take advantage of these process improvements to meet demanding design specifications if they have the right set of logic libraries and embedded memories that take full advantage of these new process capabilities. The 草榴社区 DesignWare Logic Libraries with leading EDA tools, memory compilers and the complete line of interface IP are designed to enable SoC designers to push the limits of performance, area and power and fully utilize the capabilities of this new process for SoCs with the smallest area and highest megahertz per milliwatt.
For more information, visit: /dw/ipdir.php?ds=hpc-design-kit