草榴社区

Enhancing Chip Design Simulation with AI

Taruna Reddy

Feb 15, 2023 / 4 min read

Simulation remains the workhorse technology for functional verification of register-transfer level (RTL) chip designs. In a typical flow, static verification runs early in the chip design process and looks for structural bugs such as clock domain crossing (CDC) and reset domain crossing (RDC) errors. Static analysis finds approximately 10% of all design bugs. Formal verification, applied primarily at the block level, typically detects 20% of bugs. The simulation phase is where 65% of the total bugs are caught, with the final 5% found using emulation and prototyping.

On the simulation front, the key challenges are performance, debug turnaround time (TAT), and coverage closure. The need to run frequent regressions any time there are changes in the RTL design means that simulator performance needs to be optimal so that it cannot cause project delays further down the line. The slowing of Moore’s law means that performance cannot be dramatically improved simply by running on the latest compute servers.

Artificial intelligence (AI) and machine learning (ML) provide an effective way to improve performance beyond upgrading hardware, by optimizing the selection of the many switches available in the 草榴社区 VCS? simulator. That is the focus for this blog post, but it is important to note that AI/ML have also been successfully employed to speed debug TAT with 草榴社区 Verdi? Automated Debug System regression debug automation for binning, clustering, and triaging failures and to accelerate coverage closure in the 草榴社区 VCS environment.

Adjusting Simulator Options Via ML

There are many simulator switches, design feature-related options, and regression settings that affect performance. Arriving at the optimal set manually is time-consuming and requires significant expertise about the simulator and user environment. Typically, no one person has all this knowledge, leading to inefficiency and wasted cycles in optimizing the simulator settings. The available options cover both design/testbench compilation and simulation runtime. When performance is already an issue, repeated compilations and runs using different variations in settings add even more time to the schedule.

Even if the user is willing to make this effort, it is not a one-time investment. As the design and testbench evolves, and more regressions are run, the settings need to be adjusted to achieve peak performance. Using ML to learn simulator options and automatically adjust them as needed improves regression performance and efficiency. The Dynamic Performance Optimization (DPO) technology inside the 草榴社区 VCS simulator uses ML to learn from prior regressions and tunes the simulator settings accordingly without user input.

 The Dynamic Performance Optimization (DPO) technology inside the 草榴社区 VCS simulator uses ML to learn from prior regressions and tunes the simulator settings accordingly without user input.

The user can set the frequency of the learn phase based on factors such as RTL/testbench updates, decrease in performance over time, and debug capabilities. As the learnings are applied to multiple regressions, DPO leads to an overall reduction in regression TAT.

Depending on the type of design (gates/RTL/low power) and performance bottleneck (compile/runtime), the appropriate application of DPO can be used. With every release of the VCS simulator, new DPO apps will be introduced to target different aspects of performance.

Case Studies: Real-World Uses of VCS DPO Technology

An interesting case study of DPO used for was presented by Vishwanath (Vish) Gunge of Microsoft at 草榴社区 Verification Day 2021. As sanity regressions are run several times a day, any optimizations that can be made contribute to more efficient use of compute resources. The learn phase runs were about 30% slower than the base level, but these were used only when needed due to the factors mentioned earlier. Since application runs averaged 25% faster, the team could do roughly 30% more sanity regressions per day without adding any additional compute power.

A 草榴社区 Users Group (SNUG) presentation (login required) reported similarly impressive results. On a production system-on-chip (SoC) project, 草榴社区 applications and R&D engineers had worked in the past with the users to optimize the simulator settings and reduce regression TAT by 1.4x. When 草榴社区 VCS DPO was applied, regression TAT was reduced by 1.13x beyond the results of considerable manual effort, for a net improvement of 1.58x. When DPO was applied using default simulator settings, regression TAT was improved by the same 1.58x with no manual optimization effort at all.

Most recently, at SNUG Singapore 2022, users presented a real-world case study and reported a 25% improvement in performance when using DPO in their simulation regressions runs. The broad array of apps and the fully automated process mean that any 草榴社区 VCS user can improve regression TAT by optimizing simulation settings.

Conclusion

Performance tuning, debug, and coverage closure are three areas identified so far where AI/ML and automation have been successfully used to address the challenges of traditional manual processes. This trend is only going to grow as the amount of regression data continues to explode and the scope of verification challenges evolves.  To learn more about improving performance automatically with DPO, read our white paper. Stay tuned to this blog for more updates about innovative technologies in the 草榴社区 VCS simulator.

Continue Reading