草榴社区

ASIP eUpdate, February 2022

<p>草榴社区’ solution to efficiently design and implement your own application-specific instruction-set processor (ASIP) when you can’t find suitable processor IP, or when hardware implementations require more flexibility.</p><p>This bi-annual newsletter provides you with easy access to ASIP-related resources.</p>

ASIP Designer

草榴社区’ solution to efficiently design and implement your own application-specific instruction-set processor (ASIP) when you can’t find suitable processor IP, or when hardware implementations require more flexibility.

This bi-annual newsletter provides you with easy access to ASIP-related resources.

Technology Feature: Wide scope of RISC-V ASIP models ready for ASIP accelerator development

When developing an ASIP architecture, engineering teams typically do not start from a blank sheet of paper. Often, ASIP Designer? customers start from one of the example processor models that are shipped with the tool installation. This library contains models that are based on either publicly known ISAs such as Hennessy and Patterson’s DLX or the more recent RISC-V ISA, or on other ad-hoc ISAs.  The examples are there to demonstrate how to model specific architecture features such as SIMD, VLIW, floating point, multi-threading and many others.  草榴社区 is continuously extending its library of ASIP example models.  For example, a wide scope of RISC-V ISA based models has been created and are frequently used by customers as a starting point to design proprietary ASIP accelerators.  Using a RISC-V ISA baseline facilitates compatibility with and reuse of existing processor ecosystem elements.

In ASIP Designer, the Trv family of processors implements the RISC-V ISA.  These models implement the base integer ISA and various ISA extensions.   In this section we will elaborate on the members of the Trv family listed in Figure 1.  All these variants are verified against reference implementations of the RISC-V ISA. They are fully supported by all components of the ASIP Designer tool suite, including C/C++ compilation, the generation of both cycle- and instruction-accurate simulation models (that can also be integrated in a virtual platform), RTL generation, and on-chip debugging.

Figure 1: Trv family of processor models

Figure 1: Trv family of processor models

Integer models

The integer models implement the RV32IM or RV64IM base integer instructions and multiplication extension.  They come in versions with a 32- or a 64-bit wide data path and with a three or five stage protected pipeline. Multiplications are executed on a hardwired multiplier; divisions are executed on an iterative divider unit.

These models are optimized for area and clock frequency. Depending on the configuration and on the clock frequency, the gate count ranges from 28k to 40k gates for the 32-bit variants.  For a 28nm technology, clock frequencies (Fmax) as high as 1.4 GHz can be achieved. Using the ASIP Designer compiler a performance of 3.3 CoreMark/MHz is reached.

Floating-point models

The floating-point models add the F extension instructions to the 32-bit wide data path integer models, implementing the RV32IMF ISA. Additive and multiplicative instructions are executed on a hardwired fused multiply-add unit with a throughput of one operation per clock cycle. For the 5-stage version, this unit is pipelined. Other hardwired units implement compare, min/max and conversion instructions. The floating-point division and square-root instructions are executed on a shared iterative unit.

The gate counts for these models are in the range of 55k to 90k gates. The Fmax of the 5-stage variant is 1.2 GHz (28nm).

Extended models

As mentioned, the purpose of the Trv models is to have a solid starting point for the development of an application-specific processor. ASIPs often target compute-intensive applications where arrays of data that are stored in memory must be processed.  For these type of applications, we observe that the RISC-V ISA lacks the following important features:

  • A zero-overhead looping mechanism that allows to efficiently implement the loops that iterate over the arrays. 
  • Load and store instructions with a post-modify addressing mode, that allow to make pointer updates without instruction overhead.
  • Instruction-level parallelism to support the simultaneous execution of a compute operation and a memory access.

To provide an improved starting point for compute-intense applications, we have developed models that extend the standard RISC-V ISA with these features. These models have an <x> suffix to their name (see Figure 1). Figure 2 shows the ISA that is supported by these models. It contains:

  • The standard RV32IM 32-bit instructions, encoded with the “11” format bits at the LSB side.
  • Dual issue (parallel) instructions. Slot1 supports compute operations (ALU, multiply, divide) and control operations.  Slot 2 supports load/store operations and the ability to execute a register move in parallel with a slot1 operation.  The dual-issue instructions are 64-bit wide and are encoded with the “10” format field.
  • Note that there are additional 64-bit instructions, under the “00” format.  These are single-issue control-flow and compute instructions with a long 32-bit immediate operand.  When large constants or long jumps are present in the code, these instructions offer a more efficient alternative to the two-step approach for constant or jump-target generation that is needed when only standard RISC-V instructions are present.
Figure 2: Instruction formats supported by Trv<x> processor models (visualization by ASIP Designer's nMLView tool)

Figure 2: Instruction formats supported by Trv<x> processor models (visualization by ASIP Designer's nMLView tool)

The following code is generated by the ASIP Designer compiler for the inner loop of the CoreMark matrix multiplication code. Note that it contains an instruction that executes an addition and a load in parallel. The lh instruction uses post-modify addressing.

5680 00071781 00370732

add x14, x14, x3  | lh x15, 0(x14)

5688 0024980b

lh  x16, 2(x9!)

5692 02f808b3

mul x17, x16, x15

5696 80000031

add x8,  x8,  x17

SDX models

The Trv-SDX is an example processor model that implements the RISC-V ISA, and additionally contains templates for extension instructions.  These templates are encoded using the RISC-V custom-2 opcode space, which has been reserved in the standard to enable custom ISA extensions. The Trv-SDX model was covered in detail in the October 2020 ASIP eUpdate.

Tmoby model

Tmoby is an example of an application-specific processor that was designed starting from the Trv32p5 model (Figure 3). The objective was to design an accelerator for convolutional neural networks like MobileNet. We targeted medium throughput applications and decided upfront to allocate a vector data path with 64 MAC units, each capable of executing an 8x8 bit multiplication and 18-bit accumulation. The vector data path contains multiple vector register files: VEC stores a vector of 8 features, MAT stores 64 weights. The feature vector is replicated eight times and multiplied with the weights. The resulting product is added to the accumulator ACC. To sustain a throughput of one vector MAC per cycle, we need to load a new feature vector and a new weight vector each cycle. This is achieved by allocating two vector memories, VM and WM. In the ISA, we provide two loads as parallel operations to the vector MAC. The VLIW structure has a fourth slot, which hosts the RISC-V scalar instructions. With this architecture we can accelerate MobileNet V3 by a factor 360 compared to a scalar RISC-V core.

Figure 3: Tmoby ASIP architecture, with RISC-V scalar data-path (far left) and vector data-path extensions

Figure 3: Tmoby ASIP architecture, with RISC-V scalar data-path (far left) and vector data-path extensions

What’s New: ASIP Designer S-2021.12 Release

In December 2021, we launched the latest feature release of ASIP Designer, providing various enhancements and extensions. The following is an extract, sorted by categories (customers can refer to the official Release Notes for a comprehensive list). 

Note that the release schedule of ASIP Designer and ASIP Programmer? has been modified. The current feature release is the first one under the new release schedule.

  • Feature releases are released in June (20xx.06) and December (20xx.12), instead of March (20xx.03) and September (20xx.09) previously.
  • Service packs are released on a 3-monthly basis, instead of every 6 weeks previously. So, one service pack will be released in March (20xx.12-SP1) and one in September (20xx.06-SP1).

Example Processor Models

Designers can choose from an extensive library of example processor models provided as nML source code. In combination with ASIP Designer, these models can be used as a starting point for architectural exploration and customer-specific production designs. In the 2021.12 release there are two important updates for the Trv models: 

Processor Modeling

  • The nML language generalizes the matching conditions in syntax attributes, to control alternative assembler syntaxes
  • Generation of instruction decode functions for use in PDG
  • Directly write to the elements of nML vector storages in PDG

C/C++ Compiler

ASIP Designer comes with a unique and patented compiler solution, with the compiler automatically retargeting itself to the processor architecture. This eliminates any need for compiler backend customization by the user. Release 2021.12 offers: 

  • When applying software pipelining to loops, the compiler can now swap live ranges allocated to the same register field, to further reduce the loop’s initiation interval and thus the application’s overall cycle count.
  • LLVM’s Link Time Optimization (LTO) functionality has been added to the compilation flow, allowing for whole program optimization.
  • The LLVM-based front-end has been updated to the most recent LLVM version 13.0.
  • Improved compiler support has been added for the unique requirements of the RISC-V ISA. This includes two-step constant generation where the signedness of the lower half word ripples into the higher half word, two-step far conditional jumps, and spilling of vector registers in the absence of a stack pointer indexed addressing mode.

Instruction-Set Simulation and Debugging

  • Users can now dynamically (at run-time) configure the desired debug features in both the cycle-accurate and the instruction-accurate simulator.  This avoids any performance overhead of unused debug features, and removes the need to build multiple ISS variants (each with different features enabled) for different use scenarios.
  • Debug features (like profiling or various checks) are disabled now by default when the ISS starts, unless they are configured (when building the ISS) to be automatically enabled from the start, and they can always be enabled at runtime.

RTL Generation, Verification, and Synthesis Support

  • A new functional safety feature has been added, in the form of automatic generation of dual-core lockstep (DCLS) functionality in the ASIP’s RTL implementation.
  • Enhanced exploration capabilities are offered for synthesis-in-the-loop?, with a new driver script to generate RTL variants (sweeping through alternative option settings of the RTL generator or alternative nML variants) for evaluation with 草榴社区’ RTL Architect tool.
  • Updated Reference Methodology scripts are provided for RTL Architect and for Fusion Compiler.

Additional Resources

Customer References