Cloud native EDA tools & pre-optimized hardware platforms
"How much is your Antutu score?" This used to be a relevant question for mobile SoC developers, but not any longer!
Over the last few years, the application processor domain, has seen a race with respect to the number of cores in a system-on-chip (SoC). You will easily recall marketing for "dual core", then it was "quad core", followed by "octa core" (8), big.LITTLE, and even up to 10 cores. A similar story applies for GPUs, which has been mainly fueled by the smartphone mobile market.
However, the landscape is changing in several ways and increasing numbers of cores isn't necessarily the goal. Power considerations are gaining relevance as a factor for chipset makers to select an application processor chipset from a supplier, due to the following:
Power optimization and power estimation are two different approaches, but aim towards the same goal: reducing the overall power consumption. They differ in the way they look at the problem.
Power Optimization techniques can be applied during the implementation of each part of the SoC hardware and software. Techniques vary depending on the nature of the part being implemented and optimized. For instance, EDA tools focus on RTL and gate level optimization. Whereas, software profiling tools focus on the optimization of runtime libraries. Power measurement metrics for these kinds of optimizations require accurate knowledge of each component being optimized.
Power Estimation techniques are applied before the implementation; or as a way to compare between different implementation options, followed by selecting the most appropriate. As such, power estimation does not require deep detailed knowledge of each component of the SoC (like implementation details about the hardware block or the software module) but just the component power characteristics. It turns out that this block level is good enough to compare alternative implementation options.
This article focuses on power estimation at the SoC block level.
There are two constituents for power estimation: Static constituent as the characterization of the power number and dynamic constituent as the simulation of scenarios.
For handheld battery operated product, managed upon a 'best effort' strategy, these two constituents are key factors for the power estimation.
Virtual prototyping, using Virtualizer, is the best approach to produce these scenarios, as it enables:
For a general introduction of Virtualizer, please click here. For detailed explanation of other use cases, refer to the Better Software. Faster! virtual prototyping book.
For this virtual prototyping customer, prototyping using Virtualizer is an established flow. The ability to enable power estimation is a simple extension to the existing Virtualizer prototyping flow.
Virtualizer enables an incremental prototyping flow, which can be initiated very early in a new SoC development process:
Figure 1: CF-Bench executing on top of Android on the customer SoC VDK (Virtualizer Development Kit)
It is important to notice that even the initial VDK, shown in Figure 1, is making use of the latest ARM Cortex application processor version that only exists as a simulation model but not as stable RTL. This is important for power estimation as the software stimuli should be as close as possible to the final multicore reality.
The bring-up of the Linux software layer that manages the power is the first important step towards enabling the power estimation flow. This requires the availability of models for the PMU and CMU blocks (Power and Clock Management Unit). These SystemC models are either developed or can be reused from existing examples or previous projects.
Once the software is working functionally correct a power model overlay can be attached to these models (see Figure 2). One capability of the power model overlay is to provide warnings when the model is in a state in which the IP block cannot be accessed (i.e. if a hardware is in retention mode or power down, then the software cannot read a status register).
Figure 2: Power management software bring-up using Virtualizer
The platform extension with the power overlay model makes it is possible to execute and test various corner cases related to power.
Once the power overlay model is available the different scenarios for the power estimation can be prepared.
These scenarios must be driven from the application level and capture the behavior of the end-user interacting with the smart phone like one would do in the real world. For performance scenarios, there are a plethora of benchmarks (BBench, Antutu, etc.). However, these scenarios are not really covering the full spectrum of real world user interaction and thus need to be complemented by other scenarios, such as web browsing, music planning, navigation, etc., (Figure 3).
Figure 3: Realistic scenario of web browsing with rich feature, and power state analysis tracing
Now these realistic scenarios together with the power overlay model can be used to estimate the duration each hardware block is in a certain power state. This is the dynamic part of the power estimation.
Let’s now go back to the static characterization part of power estimation.
Each power state is characterized by an estimated mW value. RTL is not available early in the design and thus estimation based on actual or gate level power estimation technologies (e.g. based on switching activities) cannot be done. The selected alternate approach is to gather information from the ASIC teams based on last generation SoCs.
The approach to setup this flow has been to apply the characterization on a known already implemented design in silicon and compare results: simulated power model vs. real silicon (see Figure 4).
Information from ASIC teams came as basic power equations separating dynamic power and leakage.
In this initial project for setting the power estimation flow, the focus was mainly on dynamic power, since it is the most directly impacted by the scenarios.
Figure 4: Power analysis using Virtualizer
To compare the measurement results, various Android benchmarks were executed on the VDK version of the SoC and on the final silicon.
The screenshots below (Figures 5 and 6) show the measurements and correlation of the various tests.
Figure 5: Comparing BBench power versus CF-Bench power using a VDK
Figure 6: Comparison of VDK and real silicon – cluster power consumption
Comparing the VDK versus the real silicon has shown a strong correlation between silicon and the VDK for most of the test scenarios.
However, the absolute power figures were quite off:
The project represented a breakthrough for the customer and proved the ability to use virtual prototyping tools to setup a consistent and sustainable power estimation flow. Using Virtualizer enabled the customer to:
Looking forward, the customer plans to expand the scope of this methodology to better cover certain hardware accelerators, revisit leakage and possibly investigate thermal analysis.
Also, now that has been announced as a standard, the ability to more easily exchange information for the characterization from ASIC and third-party IP suppliers is much anticipated.