草榴社区

ASIP University Day

<p>?</p>
<p>If you were unable to attend or attended and would like to review the material, scroll down to the agenda to download pdfs of the presentations.</p>
<p>?</p>

ASIP University Day Virtual Event

Proceedings are available.

?

If you were unable to attend or attended and would like to review the material, scroll down to the agenda to download pdfs of the presentations.

?

Domain-Specific Processor Design using ASIP Designer

Application-specific instruction set processors (ASIPs) have established themselves as an important implementation option for modern SoCs, i.e. when standard processor IP cannot meet challenging application-specific requirements, and fixed hardware is not flexible enough.  Heterogeneous multicore systems including ASIPs are now becoming more mainstream. Domains such as 5G, data centers, artificial intelligence, image and video processing or automated driving assistance have fueled the development of such ASIPs, and triggered many university projects.  Processor design projects such as the RISC-V initiative also initiated a lot of interest.  With all the commercial activity around RISC-V these days, it has outgrown UC Berkeley.

草榴社区’ ASIP Designer is the market leading tool for ASIP design, verification and programming, and is used by leading companies around the globe with hundreds of successful projects to date.

At this informal event, leading university teams presented results from their ongoing ASIP projects in a variety of application domains such as 5G baseband and AI accelerators.  草榴社区 shared insight on market trends, and provided a technical update on ASIP Designer along with reference examples. 

Agenda

6:10am - 6:45am PST
3:10pm - 3:45pm CET

Getting Started … Application-specific Processors (ASIPs) in System-on-Chip Design

Patrick Verbist - Product Marketing Manager, 草榴社区
Falco Munsche - Technical Marketing Manager, 草榴社区

ASIPs have established themselves as an implementation option next to standard processor IP and fixed-function RTL. They combine hardware specialization with flexibility through software programmability.  This talk will provide an introduction into 草榴社区' ASIP Designer tool-suite, targeted markets, and how 草榴社区 collaborates with university partners in this domain.


6:45am - 7:15am PST
3:45pm - 4:15pm CET

FlexACC: A Programmable Accelerator with Application-Specific ISA for Flexible Deep Neural Network Inference

En-Yu (Daniel) Yang - PhD student in Computer Science at Harvard University

Deep neural networks (DNN) have become ubiquitous and dominant in various application domains due to its state-of-the-art learning capabilities. To run compute and memory intensive DNN models, designing specialized hardware accelerators becomes the common choice. However, the performance improvement in accelerators comes with limitations on programmability,which has become crucial given the rapid evolution of DNN models. In this work, we first conduct workload analysis on a diverse set of DNN models, including CNN, LSTM, Transformer, and GCN to demonstrate the challenges of generalizing DNN acceleration. Next, we present a high-programmable accelerator, referred as FlexACC, with a novel application-specific ISA for flexible DNN inference. To increase the programmability, the general-purpose RISC-V instructions are tightly coupled with DNN instructions in FlexACC ISA. Compared with standalone fixed-datapath CNN and LSTM engines, FlexACC only has small latency and area overhead, while it provides much higher programmability and flexibility. ASIP designer is used in the hardware implementation of FlexACC, and this work is accepted to ASAP 2021 conference.


7:15am - 7:45am PST
4:15pm - 4:45pm CET

AI-RISC - Scalable RISC-V Processor with Tightly Integrated AI Accelerators and Custom Instruction Extensions

Vaibhav Verma - PhD Student in Electrical Engineering Department, University of Virginia
Mircea R. Stan - Virginia Microelectronics Consortium Professor, ECE Department University of Virginia

Artificial intelligence (AI) accelerators are often specialized for a particular task, are very costly to produce, require special programming tools, and become obsolete as new AI algorithms are introduced. Hence, there is an urgent requirement for a system-level solution to streamline the integration of different AI accelerators into standard computing and programming stacks. We present AI-RISC as a solution to bridge this research gap. AI-RISC adopts a hardware/software codesign methodology where AI accelerators are integrated in the RISC-V processor pipeline at a fine-granularity and treated as regular functional units during the execution of instructions. AI-RISC also extends the RISC-V ISA with custom instructions which directly target these AI functional units (AFU) resulting in a tight integration of AI accelerators with the processor. AI-RISC adopts a 2-step compilation strategy where open-source TVM is used as the front-end compiler while 草榴社区 ASIP Designer is used as the back-end for complete SDK generation. AI-RISC provides 1.75x (MAC) to 17.63x (PIM VMM) performance improvement for a GEMV kernel and 1.41x (MAC) to 4.41x (PIM VMM) reduction in processor clock cycles for ResNet-8 neural network model from TinyMLPerf benchmark depending upon on the size of added accelerators and complexity of added instructions.


7:55am - 8:25am PST
4:55pm - 5:25pm CET

An Application Specific Vector Processor for CNN-based Massive MIMO Positioning 

Mohammad Attari -  PhD student in the Digital ASIC Research Group in the Electrical and Information Technology (EIT) Department, Lund University

As 5G-capable networks and devices gradually roll into the market, they bring with them a host of exciting applications. One example use-case is terminal positioning, and in this work we set out to create an application specific instruction set processor (ASIP) implementation to enable user positioning in a wireless system by means of deep convolutional neural networks (CNN). The positioning is based on the fingerprinting method using massive multiple-input multiple-output (MIMO) technology, and utilizes the wireless channel state information (CSI). The ASIP is designed to combine flexibility with implementation efficiency, and is equipped with vector processing capabilities employing a single instruction multiple data (SIMD) scheme, and additionally has a very large instruction word (VLIW) architecture to further exploit instruction-level parallelism. Due to the sheer volume of computational requirements imposed by CNN processing, an accelerator-assisted design is well-suited to the task at hand. As a result, a configurable 2D array of processing engines (PE) is integrated into the processor, in a tightly coupled manner, to accelerate the CNN operation. Synthesis results will be demonstrated using the GF-22nm FD-SOI technology.


8:25am - 8:55am PST
5:25pm - 5:55pm CET

Flexible Channel Estimation for 3GPP 5G IoT on a Vector Digital Signal Processor

Stefan Damjancevic - PhD Student in the Signal Processing Hardware Group at the Vodafone Chair Mobile Communications Systems, TU Dresden

The new 5G Reduced Capability (RedCap) protocol offers up to 88x and 528x higher data rates and dynamic pilot placements compared to previous Cat-M and NB-IoT standards, respectively. This leads to high application variability of IoT devices and therefore poses a challenge for the implementation of channel estimation (CE), especially under weak radio signal conditions. However, due to the computational complexity of optimal methods, practical suboptimal approaches with denoising capability are preferred in low-power devices. This work investigates the performance and implementation aspects of practical IoT CE denoising techniques on a vector digital signal processor (vDSP). This solution enables adaptation to the new IoT workload requirements with a 15.9x speed-up compared to the non-vectorised approach at 99.2% processor efficiency. In addition, for the purpose of solution adaptation to various IoT standards, the clock frequency requirements for the complete channel estimation chain are analysed with respect to different processor configurations.


8:55am - 9:25am PST
5:55pm - 6:25pm CET

Tmatch, a Flexible Stereo Image Matching Accelerator Designed with ASIP Designer

Erik Brockmeyer - Senior Applications Engineer, 草榴社区

Sum Square of Difference (SSD) computations are used to calculate the pixel disparity in stereo image matching algorithms. These algorithms are very demanding in respect of processing power (about 30 TMACs/s). We optimized the application code and co-developed the optimized instructions at high level. Using the compiler-in-the-loop flow offered by ASIP Designer we easily verified the correctness of the application code and evaluated the performance impact.  In a few weeks time we explored multiple efficient implementation solutions and made performance vs. cost tradeoffs using the compiler- & synthesis-in-the-loop optimization flows. The end result of this design effort was a highly specialized vector ASIP with limited ILP. The ASIP features design techniques enabling the reuse of partial computed results and multiple specialized memories in parallel with specialized addressing modes.


9:35am - 10:05am PST
6:35pm - 7:05pm CET

Utilizing ASIP Designer for Industry Projects, Teaching Activities and Research

Lennart M. Reimann - PhD Student and Research Assistant at  Institute for Communication Technologies and Embedded Systems, RWTH Aachen 

At our institute, we use ASIP Designer in all three of our areas of work: research, teaching and industry projects. 

For two years, we teach students how to optimize RISC-V processors for two cryptographic algorithms: AES and SHA. The students learn how to understand the structure of processors, the processor description languages. The lab starts with a step-by-step manual on how to optimize for the first algorithm. For the second algorithm they are on their own. Thus, the students are required to make design decisions on their own and evaluate them with respect to our provided metrics and our tools representing them.

Moreover, we go an experimental approach in one of our BMBF projects to utilize ASIP Designer solely to design a compiler for a given RISC-V host processor communicating with a dedicated co-processor. We can present the description of the underlying hardware and the challenges that await us. 

For my research in the field of hardware security, we utilize the RTL generator and the compiler to design secure software for insecure hardware. As this work is not published yet, I will only be able to give a short insight into it.


10:05am - 10:35am PST
7:05pm - 7:35pm CET

Learning Computer Architecture Through the ASIP Paradigm: A Research-oriented Approach

Pierre Langlois - Full Professor and Chair of the Computer and Software Engineering Department of Polytechnique Montréal

The course INF8505 – Embedded Configurable Processors – is a three-credit, 135-hour graduate computer architecture course that has been running in Polytechnique Montréal since 2008. It is also available as an elective to senior Electrical and Computer Engineering undergraduates. The course focuses on Application Specific Instruction set Processor (ASIP) design, and its main topics are custom datapath and custom memory hierarchy design, processor description languages, retargetable compilers, and processor performance metrics. It exploits high throughput applications such as deep learning, image processing and cryptography to demonstrate the ASIP's potential. From its beginning, the course has been taught in a flipped-classroom style where students are assigned one research paper every week for which they must produce a one-page report. The paper is then discussed in class and the instructor weaves the course topics with the paper's main contributions. A major 52-hour course project is anchored in laboratory exercises. Two-student teams use 草榴社区' ASIP Designer to design and simulate an ASIP tailored to an application of their choice, then submit a project report as a 4- or 6-page research paper. To date, there are almost 250 course alumni and half a dozen project report papers have been presented in international conferences.


10:35am - 11:05am PST
7:35pm - 8:05pm CET

Under the Hood of ASIP Designer

Gert Goossens, Senior Director of Engineering, 草榴社区

ASIP Designer enjoys strong adoption among SoC designers.  The continuous evolution towards smarter connected systems is driving  SoC design teams towards ever more advanced ASIP architectures, with larger amounts of instruction-level and data parallelism, deeper instruction pipelines and more architectural specialization.  In turn, this is driving 草榴社区 to continuously innovate and extend the scope of the ASIP Designer tool-suite.

In this presentation we will take a deeper dive into some of the key innovations behind the ASIP Designer tool-suite, with a focus on processor modeling and retargetable compilation.  Also we will describe a few new tool innovations that were introduced in the latest releases of ASIP Designer.  These include: extensions of the nML processor description language for more efficient modeling of parallelism, automatic test-program generation to enhance the verification of processor architectures, and a novel approach to PPA estimation.  


Visit Us on Social