Cloud native EDA tools & pre-optimized hardware platforms
草榴社区’ solution to efficiently design and implement your own application-specific instruction-set processor (ASIP) when you can’t find suitable processor IP, or when hardware implementations require more flexibility.
This bi-annual newsletter provides you with easy access to ASIP-related resources. This issue includes the following topics:
Designers can choose from an extensive library of example processor models provided as nML source code. In combination with ASIP Designer?, these models can be used as a starting point for architectural exploration and customer-specific production designs.
A new Example Processor Models web page is now available which provides a concise overview of the example processor models available with ASIP Designer and their features.
In this section we elaborate on a new example processor model that is introduced with the 2023.06 release of ASIP Designer. It is called “Tsec” and implements an accelerator for post-quantum cryptography.
Kyber, the first standardized key encryption mechanism designed to withstand attacks with future powerful quantum computers, is computationally very demanding due to extensive use of hashing, for example. The Tsec example is an ASIP optimized for accelerating Kyber. It evolved from a RISC-V base model to which custom application-specific instructions were added as well as architectural specializations that go beyond simple RISC-V extension mechanisms, such as adding heterogeneous storage.
The underlying base model is Trv32p5x, a previously existing example processor model with a RISC-V scalar instruction set (RV32IM) and 5 pipeline stages, enhanced with DSP-type extensions including:
- A zero-overhead looping mechanism that allows to efficiently implement loops that iterate over arrays
- Load and store instructions with a post-modify addressing mode, that allow to make pointer updates without instruction overhead
- 2-way instruction-level parallelism to support the simultaneous execution of a compute operation and a memory access
Using the rich profiling capabilities of ASIP Designer, an open-source software implementation of the Kyber algorithm was simulated and profiled on the baseline model. Two main computational kernels were identified as the dominating bottlenecks: modular finite-field operations such as “Montgomery reduction” and “Barrett reduction”, and a hashing mechanism called “Keccak state permutation”.
The Montgomery and Barrett reduction functions could be accelerated by fusing them into single instructions. These fused instructions operate just like a custom scalar ALU instruction on the central register file X.
Figure 1: Arithmetic unit for Montgomery reduction
Figure 1 depicts a custom hardware resource as needed for a single fused instruction performing Montgomery reduction. A resource for Barrett reduction looks very similar, so both were merged and shared between the instructions. Furthermore, multiple instances of the Barret reduction block, along with adders and finite-field multipliers, were combined into a larger butterfly-alike hardware block as depicted in Figure 2, which is triggered by even more specialized single instructions.
Figure 2: Custom butterfly unit with Barret reduction logic and finite-field multipliers
The debugger snapshot in Figure 3 shows how the specialized butterfly instructions are utilized by the compiler in the innermost loop of the number-theoretic transform (NTT) function.
Figure 3: Software-pipelined NTT function
The innermost loop is implemented as a hardware loop (zlp). The loop body consisting of six instructions is software-pipelined, consisting of butterfly instructions, finite-field multiplications and additions, with memory accesses scheduled in parallel.
For the Keccak permutation function, the situation is a bit more complicated. The bit-level logic operations of the hashing mechanism can still be fused into one big logic cloud. The interface of the function, however, takes an entire array of 25 64-bit state variables as an argument, which results in extensive load/store traffic on the general-purpose register. The general-purpose register file of the baseline processor (32 x 32-bit) is just not big enough to capture 25 64-bit values simultaneously, and additionally, it would be too expensive to add the number of parallel ports required by the Keccak operation.
Instead, we created a dedicated register file “S” with 25 fields of 64 bits, and with dedicated 64-bit load/store access to the data memory. In addition, each register field has a direct port to the Keccak logic, which can thus access all 25 fields in parallel, as depicted in Figure 4.
Figure 4: Keccak Unit with dedicated register file
The debugger snapshot in Figure 5 shows how the compiler schedules a single-cycle instruction triggering the Keccak logic, embedded in a single-instruction hardware loop, which is surrounded by memory load/store instructions to the special S register file.
Figure 5: Single-cycle Keccak instruction scheduled in a single-instruction hardware loop
Figure 6 is a screenshot of the nML viewer, a utility to graphically inspect the hierarchy of the instruction set. It shows how the custom Keccak instruction and the special finite-field instructions (grouped under “kyber_instrs”) are integrated both in the single-issue 32-bit instruction format as well as in the parallel dual-issue 64-bit instruction format.
Figure 6: Graphical view of the Tsec instruction set (partially expanded)
The new Tsec example model illustrates how ASIP Designer can be used to extend a RISC-V baseline architecture for higher performance. The specialization for the Keccak state permutation and the reduction functions result in an 8.3x speed-up of the Kyber algorithm compared to the original RISC-V baseline implementation with DSP extensions, at a moderate gate-count increase by a factor 1.8x.
Since the last edition of this newsletter, we have launched a new feature release of ASIP Designer in June 2023, providing various enhancements and extensions. The following is an extract, sorted by categories (customers can refer to the official Release Notes for a comprehensive list).
Click on each tab for additional information about that new feature
In the 2023.06 release the following updates were made to the library of example processor models:
ASIP Designer comes with a unique and patented compiler solution, with the compiler automatically retargeting itself to the processor architecture. This eliminates any need for compiler backend customization by the user. Release 2023.06 offers the following enhancements: