Cloud native EDA tools & pre-optimized hardware platforms
草榴社区' ARC? Processor Summit 2022 offered 18 sessions focusing on the latest technologies and trends in embedded processor IP, Software, Programming Tools and Applications.
Scroll down to browse the topics and register to download the presentations..
Artificial Intelligence Safety and Security
Dr. Roman V. Yampolskiy, Associate Professor, Dept of Computer Science and Engineering, University of Louisville, KY
Many scientists, futurologists and philosophers have predicted how AI will enable humanity to achieve radical technological breakthroughs in the years ahead. In his keynote, Dr. Yampolskiy will cover current progress in artificial intelligence and predicted future developments, including artificial general intelligence. The talk will address some obstacles to progress in development of safe AI as well as ethical issues associated with advanced intelligent machines. The problem of control will be covered in the context of safety and security of AI. The talk will conclude with some advice on avoiding failure of smart products and services.
Evolution and Trends Driving the Automotive Architecture and Ecosystems of the Future
Vasanth Waran, Sr. Director of Business Development, 草榴社区
New use-cases and architectures are driving changes in the manner in which automotive Electronic Control Units (ECUs) are being designed. OEMs and their suppliers are gearing towards a changing landscape precipitated by adoption of autonomous vehicles, new designs for EVs and business model transformation (Maas for e.g). How do auto-makers and their ecosystem partners adapt to these new paradigms and address advanced compute and software solutions in this new landscape? We’ll discuss the changes that the automotive supply chain, including SoC suppliers, tier-1s and tier-2s, HW & SW partners are embracing to address these new challenges.
Addressing Processing, Safety and Security Needs for Evolving Automotive SoCs
Rich Collins, Product Marketing Director, 草榴社区
Next-generation autonomous driving and advanced driver-assistance systems (ADAS) require complex safety-critical electronic components. The SoC designs used in these electronics must adhere to the ISO 26262 functional safety (FuSa) standard to achieve the highest automotive safety integrity level (ASIL). 草榴社区 offers a broad portfolio of certified functional safety compliant processor IP for developing these safety-critical SoCs. This session will cover 草榴社区' vision of current and evolving SoC level safety architectures, safety compliant ARC processors, and functional safety software and tools. It will also touch on the combination of safety and security which both need to be carefully architected in the early stages of SoC development.
Virtualized AI Workloads for Automotive Zonal Architectures – How and Why?
Fergus Casey, Director of R&D, 草榴社区
“Virtualization” uses software to simulate hardware functionality, allowing multiple operating systems to share the same hardware resources. Applying virtualization to automotive zonal architectures enables additional levels of security and safety, as well as reducing hardware costs and power consumption. In this presentation, we will describe the requirements for virtualization of processors and AI accelerators used in automotive applications, the uses of spatial and temporal isolation, and case studies on virtualization for third-party applications and for functional safety.
Virtualized AI Workloads for Automotive Zonal Architectures – How and Why?
Fergus Casey, Director of R&D, 草榴社区
“Virtualization” uses software to simulate hardware functionality, allowing multiple operating systems to share the same hardware resources. Applying virtualization to automotive zonal architectures enables additional levels of security and safety, as well as reducing hardware costs and power consumption. In this presentation, we will describe the requirements for virtualization of processors and AI accelerators used in automotive applications, the uses of spatial and temporal isolation, and case studies on virtualization for third-party applications and for functional safety.
A Scalable Framework for Fast Design Space Exploration of AI Workloads in Automotive SoCs
Carlos Román, Head of US 草榴社区 Architecture & Technical Sales, Sondrel
The Sondrel’s Scalable Architecture Framework (SAF) defines a set of processes on Requirements Engineering, Systems Architecture and Virtual Prototyping. Several reference SoC Architectures have been derived from this framework, each targeting specific application use cases. For automotive ASIL-D applications, the SFA350A reference architecture provides the necessary feature set and scalability options to support a wide range of automotive compute requirements.
Recent industry trends show that automotive AI applications are starting to employ ever more sophisticated neural network algorithms, such as Vision Transformers (ViT), which are now out-performing CNNs and RNNs on several benchmarks. In this talk, we will show how the requirements of complex AI workloads, such as ViT, are analysed, so that the System Architecture of the SFA350A is tuned accordingly.
Rapid design space exploration is accomplished using performance models of an ARC NPX6 NPU with a VPX5 DSP companion, a FlexNoC Interconnect, and an LPDDR5x memory subsystem to balance all available features and determine the optimal hardware configuration of the SFA350A. A notable attribute of the ARC NPX6-VPX5 combo is that it is compatible with the “slice architecture” formalism employed in the SAF. This is key to achieving fast design space exploration of the demanding AI applications that automotive SoCs are required to support now and in the foreseeable future.
It Takes Two - Balancing Data Movement and Compute in a Radar Application for Maximum Performance on a Vector DSP
Pieter van der Wolf, Principal Architect, 草榴社区
Typical DSP benchmarks published in marketing collateral assume an ideal scenario: Data is available in local memory, and it is arranged in such a way that optimal results are achieved for the compute part of the targeted application. Yet for most real-world applications, the limited size of the local memory requires that data is loaded and stored from L2 / L3 memory using DMA transfers. Hence, users need to focus as much on the efficiency of the DMA transfers as they would for the compute, to arrive at a balanced system solution. Specifically, a Vector DSP must enable that DMA transfers can be performed in parallel to the compute, so that their latency can be hidden. Further, to support efficient compute, data should be organized properly in local memory.
This demands advanced DMA capabilities, to reorganize data on the fly during data movement, as well as a versatile suite of load/store instructions for efficient access to data in local memory. We will discuss the above aspects in detail, using the VPX Vector DSP as a reference. Using an example Radar application we will show how high-performance DSP processing can be implemented with efficient access to local memory and multi-dimensional DMA transfers happening in the background, to arrive at an efficient system solution.
Impact of ISO26262 and ISO21434 on Tools and Software for Automotive Systems
Joachim Hampp, Product Architect, TASKING Germany GmbH
Safety and security standards require justification regarding the safe usage of tools. This is typically achieved through an approach based on tool qualification. TASKING will provide insight into how tool qualification helps your project meet these standards as well as what tasks a project team must perform itself. This session will be based on an automotive use case using a PPU (Parallel Processing Unit) based on ARC EV71, and will discuss connecting software running on a main compute core and the ARC-based PPU.
AI Enabled DSPs to Accelerators – Dialing in the Right Performance
Markus Willems, Sr Product Marketing Manager & Gordon Cooper, Product Marketing Manager, 草榴社区
AI applications are driving the need for more efficient Neural Network processing across a broad range of performance, power and price points, leading to various processor-based implementation options. This session will discuss the trade-offs between selecting an AI enabled DSP or adding a dedicated AI accelerator. We will present customer use cases covering AI enabled ARC processors including ARC VPX and accelerators -- including 草榴社区’ newest Neural Processing Units (NPUs). The importance of software support across processors will be covered.
Cutting Through the Noise - High Performance Image Signal Processing Leveraging ARC AI Processors
Benny Munitz, VP Business Development, Visionary.ai
Conventional Image Signal Processors (ISPs) do an excellent job, so long as lighting conditions are good. As society becomes increasingly reliant on image sensors for both human and machine vision, however, we need to find ways of extending performance for more challenging light conditions to achieve product robustness. In this session, Benny Munitz of Visionary.ai talks about using embedded AI algorithms, running on the 草榴社区 ARC EV72 processor, to implement a sophisticated new software ISP capable of dramatically reducing image noise, and increasing dynamic range. This provides much-needed additional degrees of freedom in the image pipeline implementation to achieve better results both for human and machine vision applications.
Highly Efficient Programming Environment for Handling AI Workloads
Tom Michiels, System Architect, 草榴社区
Programming SoCs for AI workloads can be a daunting task. Machine learning algorithms can run on a variety of processor types – CPUs, GPUs, DSPs, NPU, custom accelerators – which has traditionally limited software portability. In addition, neural networks continue to evolve (e.g., CNNs, LSTMs, RNNs, Transformers) and competing AI frameworks (e.g., TensorFlow, PyTorch, Caffe2) make standardization a challenge. This session will introduce a programming environment that will accept neural networks in virtually any industry-standard format and efficiently map them to a variety of AI processor types, abstracting the underlying hardware from the AI programmer. Optimization techniques that improve execution performance and hardware resource utilization will also be discussed.
Design of a High Efficiency Accelerator for Full Scale Deep Learning Recommendation Models (DLRM) in the Datacenter
Alan Pita, Software Architect, NEUCHIPS
AI Recommender systems, particularly Deep Learning Recommendation Models (DLRM), are the dominant ML application in terms of cloud resource usage. DLRM is a fascinating business and technical challenge. The Social Media and Entertainment industries have far from exhausted the business value that can be achieved with more accurate and more intelligent predictions of consumer/user behavior. Rapid innovation is yielding novel adaptations of DLRM that produce markedly more useful predictions, commanding ever increasing compute capacity under fixed energy and space constraints. Moreover, DLRM is a hybrid dataflow that mates ML models with not-exactly-ML big data analytics.
NEUCHIPS is pioneering a first-of-its-kind engineering approach to accelerating software with purpose-built SoC hardware alongside carefully co-designed compiler and runtime software.
The RecAccel N3000, is purpose-built for AI recommendation inferences, especially for DLRM. We will discuss its asynchronous heterogenous dataflow architecture, where each type of IP/processor is carefully tailored to optimize a component of the DLRM logical architecture. We will also show how the configurable ARC processor efficiently participates in delivering groundbreaking DLRM performance on widely accepted industrial recommendation benchmarking.
How Transformers are Changing the Direction of Deep Learning Architectures
Tom Michiels, System Architect, 草榴社区
The neural network architectures used in embedded real-time applications are evolving quickly. Transformers are a leading deep learning approach for natural language processing and other time-dependent, series data applications. Now, transformer-based deep learning network architectures are also being applied to vision applications with state-of-the-art results compared to CNN-based solutions. In this presentation, we will introduce transformers and contrast them with the CNNs commonly used for vision tasks today. We will examine the key features of transformer model architectures and show performance comparisons between transformers and CNNs. We will conclude the presentation with insights on why we think transformers are an important approach for future visual perception tasks.
Creating an Optimized AI SoC Architecture Using Virtual Prototyping
Mojin Kottarathil, Staff Applications Engineer, 草榴社区
Today we see a large variety of SoCs with dedicated accelerators for the efficient processing of AI applications. Successful products in this competitive environment need to be highly optimized for the target application domain. Data-driven architecture analysis is required to optimize the AI processor configuration alternatives and SoC integration choices, like the dimensioning of the shared interconnect and memory sub-system. 草榴社区 Platform Architect Virtual Prototyping tools combined with ARC Processor IP architecture models enable early analysis of architecture alternatives and quantitative assessment of IP configuration choices.
In this presentation we will discuss the available IP, tools, and models to accelerate the early analysis and optimization of AI SoC architectures.
Agenda:
- Recent advancements in embedded AI applications and architectures
- Challenges in the design and verification of AI SoCs
- 草榴社区 DesignWare Processor IP portfolio for the design of AI SoC platforms
- 草榴社区 Platform Architect Virtual Prototyping solution for early architecture analysis and optimization
- Case-study of an AI SoC platform design with ARC VPX and NPX Processors
- How to get started
Zephyr RTOS for ARC Processors: From "Nano Kernel" to Heterogeneous Cluster
Alexey Brodkin, Engineering Manager, 草榴社区
Zephyr RTOS is quickly becoming one of the most popular general purpose open source Real-Time Operating Systems on the market. Zephyr is more than just an OS kernel with protocol stacks and driver enabling building all kinds of embedded applications.
In this session we'll discuss how software features of the Zephyr RTOS can be leveraged across the broad of ARC processor offerings. We'll start with an overview of ARC cores and features supported in the Zephyr RTOS and then we will examine some specific use-cases which utilize key features of the Zephyr RTOS such as single-threading mode, POSIX compatibility layer and SMP support for embedded multicore configurations up to 12 cores
Bluetooth Low Energy – Growth Segments are Pushing Lower Power Requirements for Battery Powered Devices
Charles Dittmer, Product Marketing Manager, 草榴社区
Bluetooth, and Bluetooth Low Energy specifically, is now a part of our everyday lives. Shipments in 2022 are forecasted to be 5.1 billion devices and 7 billion by 2026, that’s a CAGR of 9%. While forecast for the Host or “Platform” side of the solution (such as mobile phones, tablets and PCs) is relatively flat, the growth of BLE will be on the peripheral side. The predominant applications or use cases driving this growth are hearables (headphone and earbuds) supported by LE audio, wearables including AR/XR, locations services, electronic shelf labels (ESL) and a variety of tags and sensors. These segments have projected growth rates of 12-25% over the next 4-5 years.
All of these major growth segments are battery powered devices thus driving the need to have the most power-efficient solutions possible. Based on its extremely low power requirements, the sub 1 volt BLE IP solution from 草榴社区 is perfectly suited to be integrated into these power sensitive SoCs to extend product life times for non-chargeable devices and to reduce time between charging for reusable devices.
Post-Quantum Cryptography: Theory to Accelerated Practice
Vladimir Soukharev, Ph.D, Principal Cryptographic Technologies & Chief Post-Quantum Researcher, InfoSec Global & Ruud Derwig, Sr. Security/Systems/Processors Architect, 草榴社区
Post-Quantum Cryptography has been receiving a fair amount of attention over the past few years, especially with the quantum threat becoming a closer reality. NIST’s PQC standardization process is fully underway. Just recently, a big milestone of the path where the PQC algorithms are gradually becoming the cryptographic default was achieved – NIST has announced the first set of the standardized PQC algorithms. This means that it will be used as widely or possibly even more than the current conventional cryptography in the near future. This talk will provide the overview of PQC, the standardization process, and current and next practical steps to prepare for the transition to PQC. For this transition, there are various challenges to overcome. It will require crypto agility in protocols and implementations such that today’s algorithms can be seamlessly replaced with the PQC alternatives. Agility in software via firmware updates is much easier than agility in hardware.
However, just like for today’s algorithms, hardware acceleration and hardware implementations are required for PQC to meet performance as well as security targets. In this talk, we’ll explain how acceleration of PQC algorithms can be done in a flexible way, such that a single accelerator can be used for traditional algorithms as well as for various PQC algorithms. Finally, we’ll complete the ‘from software to silicon’ view by covering end-to-end aspects for managing the PQC transition using a service-based architecture to perform the provisioning and security management of the agile crypto solutions embedded in connected devices.
Rooting Trust in Hardware with Invisible Keys from SRAM PUF Technology
Pim Tuyls, CEO, Intrinsic ID
The ever-increasing number of connected devices around us introduces major security issues. Connecting billions of devices can only be done securely if every device has some form of dedicated hardware for protecting sensitive data and securing communications. How can this be done in a way that scales with the most advanced technology nodes without becoming cost-prohibitive?
The answer lies with SRAM Physical Unclonable Function (PUF) technology. Combining SRAM PUF technology from Intrinsic ID with the 草榴社区 embedded tRoot HSM, provides a new level of protection by generating secure cryptographic keys based on device-unique variations within the silicon of the chip itself. With the SRAM PUF, the root key is re-generated every time the chip is powered up and is only available in volatile memory when needed. Since the key is never present in persistent memory, even when the chip is powered down, it is not stored anywhere on the device, making it significantly harder for attackers to find. This substantially increases the level of security.
This talk will explain how SRAM PUF eliminates the need for OTP memory, while cost-effectively providing a hardware root of trust. In this presentation, you will learn:
? The fundamentals and benefits of SRAM PUF technology
? How SRAM PUFs allow you to scale your security architecture to the most advanced nodes
? How SRAM PUF technology combines with the 草榴社区 tRoot HSM
? Some example use cases
Optimize High Performance Processor Implementation with AI-enabled Fusion QuickStart Kit
Frank Grover & John Moors, Applications Engineers, 草榴社区
Get an optimized starting point for implementing 草榴社区 ARC HS68 64-bit processors for high performance embedded designs with 草榴社区 Fusion QuickStart Kits (QIKs). The ARC processor QIK includes tool scripts, a baseline floorplan, design constraints and documentation. In this session, you will learn how the QIK was used along with 草榴社区 Fusion Compiler and Design Space Optimization (DSO.ai) tools to achieve the best PPA and faster time-to-market.
Heterogeneous Multicore Design - An Always-On Use Case
Pat Harmon, 草榴社区
Almost all of today’s SoCs are multicore designs, initially driven by the need for higher performance. Yet the need for energy efficiency became another driver for such multicore design, moving to heterogeneous architectures where different cores are selected for different processing tasks. In this session we will use the example of an always-on smart home application to illustrate the tradeoffs to be analyzed.