Cloud native EDA tools & pre-optimized hardware platforms
This bi-annual newsletter provides you with easy access to ASIP related resources. This issue includes the following topics:
ASIP Designer? comes with a rich set of example processor models provided in source code, which serve both as a modeling reference as well as a starting point for customer designs.
In previous issues of the eUpdate Newsletter we covered the subject of data-level parallelism (June 2016) and instruction-level parallelism (Jan 2017), including the corresponding example models. In the October 2017 issue we looked at example processor models that demonstrate how to do a fast context switch. In this issue, we will conclude the series on example models, looking at models that highlight how a processor can be tuned towards very specific algorithmic requirements, featuring all the concepts covered in the previous editions.
Featuring: Tgauss
Tgauss is an example processor that uses the Gaussian Filtering application to illustrate processor concepts suited for certain image processing kernels.
Applying Gaussian Filtering to images results in two-dimensional matrix operations. The specifics of the algorithm allow for a separation into two 1-D filters, one performing the horizontal phase while the other handles the vertical phase. A performance-efficient implementation is directly linked to the organization of the register files as line buffers, and utilizing the ability of vector-based processing, i.e. applying SIMD concepts.
Figure 1: Tgauss data path
Some of the design decisions taken for Tgauss:
Tgauss comes in two versions: one performing horizontal and vertical phases in an iterative sequence, the second performing two horizontal phases followed by two vertical phases, reducing the line buffer memory load traffic by 2.
To illustrate the performance, the result for a 9x9 filter, RGB processing:
Smaller filters need less cycles, e.g. 3x3 separable ~ 6 cycles/pixel
The example indicates options for further speed-up:
Featuring: Tcom8
Tcom8 is an example processor that illustrates processor concepts suited for certain communication kernels, especially those containing FFT and FIR operations.
Starting from a scalar processor model, the datapath architecture extension for Tcom8 adds SIMD vector operations on 8-component vectors, and has the following new storages:
The vector memory is split in two parts, VMA and VMB, and allows simultaneous read and write from different parts, thus, (read VMA || write VMB) or (write VMA || read VMB).
A 16 x 16 bit vector multiplication consumes two cycles and allows for:
In addition, specific instructions and parallel formats have been provided to efficiently map FFT and FIR applications (as explained below).
To illustrate the performance of the core (and of course of the automatically generated compiler), the model comes with example code for FFT and FIR:
Featuring: SHA 256
This model highlights the design of a programmable accelerator as an alternative to fixed-function RTL, using the SHA-256 cryptographic hash function as an example. SHA-2 comes in different variants (SHA-224, 256, 384, 512, 512/224, 512/256 ), making it a good candidate for a flexible (because programmable), yet dedicated crypto engine.
As with all models, the SHA 256 crypto processor comes in source code. In addition, the model library includes a slide deck that describes the design process, starting from of an existing 32-bit MCU, all the way to the final SHA 256 architecture.
Some of the architecture design steps taken:
Figure 2: SHA256 transform function
The example illustrates the typical tradeoff analysis between performance (here measured as throughput in cycles/byte) and the required area. Such architectural exploration is at the heart of almost any ASIP design. Fundamental to this approach is the immediate availability of a cycle-accurate instruction-set simulator, an optimizing C/C++ compiler, and the ability to go to synthesis. ASIP Designer is the ideal tool for this task.
Figure 3: Architectural exploration of SHA256 processor
2018.03 Release Update
Since the October 2017 newsletter, ASIP Designer has again seen a number of enhancements and extensions. The following is an extract, sorted by categories (please refer to the official Release Notes for the comprehensive list).
Example Models
Processor Modeling
Defining general load/store intrinsic functions (where guarded loads/store functions are a special case), with improved LLVM support (no longer worst-case points-to information)
Simultaneous Hardware / Software Debugging
Figure 4: Hardware/Software Debugging using Verdi
C/C++ Compiler
RTL Generation and Synthesis Support, FPGA Prototyping
Labs, Tutorials and Documentation
Customer References
Cognitive Systems is a startup that developed an innovative security system, based on wireless signal analysis. The application required a chipset that covers a wide spectrum between 650MHz and 4GHz, supporting a large variety of wireless standards. Read why Cognitive Systems decided on an ASIP, and how they managed to design a complex SIMD/VLIW DSP in less than 12 months, with a small team.
White Papers
In order to develop a proprietary processor that can stand the test of time, a highly functional SDK must be developed. The complexity, cost and duration of SDK development vary depending on the architecture of the processor and the skillset of the SDK developers. In this paper, we analyze the requirements for an SDK. We then introduce a tool-based methodology for SDK development based on 草榴社区’ ASIP Designer tool suite.
Rapid Architectural Exploration in Designing Application-Specific Processors
Architectural exploration is at the heart of any ASIP design approach. Designers need to rapidly explore the impact of different architectural choices on power consumption and performance, ideally using real-world application C-code as part of the design flow. This white paper explains the architectural tradeoffs that are available to an ASIP designer, how to trade off performance vs. area, and why an ASIP design can still maintain full C-programmability while being optimized for a certain application domain.
Designing ASIPs in Multicore SoCs
Modern SoCs integrate dozens of complex system functions, each requiring its own optimal balance of performance, flexibility, energy consumption, communication, and design time. The traditional model of a (configurable) general-purpose processor core with a number of fixed hardware accelerators no longer suffices. ASIPs can offer the best balance for each system function, and thus form the basis of new generations of multicore SoCs.