Cloud native EDA tools & pre-optimized hardware platforms
Digital signal processing is all around us. Today’s devices come with dozens of sensors, the data from which must be filtered and aggregated for use by artificial intelligence (AI) models. As these AI workloads become ubiquitous across all industries, embedded systems face a growing demand for powerful and efficient signal processing. These compute-intensive AI algorithms often have a limited amount of control code, primarily operate on data streams, and require hard real-time performance with low latency constraints.
Therefore, Digital Signal Processors (DSPs) have become an integral piece of the equation. Unlike general-purpose processors, DSPs offer parallel execution of vectorized computations for minimized cycle counts and latency. However, increasing vector size also requires more silicon area, making it necessary to strike the right balance between performance and efficiency.
This is why the industry needs more choices. Designers must pick a DSP that meets their application-specific performance requirements while staying within a given area and power budget. Fortunately, the 草榴社区 ARC VPX product family has just been extended to serve that need.
The new ARC VPX6 DSP IP introduces 1024-bit vector processing to the lineup, extending the existing VPX family, which includes VPX5 (512-bit), VPX3 (256-bit), and VPX2 (128-bit) variants. As such, VPX6 doubles the attainable peak performance while preserving full software compatibility with the other members of the VPX family. By simply upgrading hardware without any recoding, designers can instantly realize performance gains from day one.
The ARC VPX DSP IP family is a line of high-performance vector DSPs designed for low-power, high-throughput computation. These processors are widely used in automotive sensing, AI-driven vision systems, radar/LiDAR, and industrial automation, where real-time processing of massive data streams is paramount.
Earlier VPX processors supported 128-bit, 256-bit, and 512-bit vector lengths, allowing developers to select the most efficient processing configurations for their workloads. These processors provide robust multi-core scalability, but as data volumes continue to grow — with higher-resolution cameras, denser sensor arrays, and more complex AI models — many applications demand even greater parallel processing efficiency.
The new ARC VPX6 expands the VPX family with 1024-bit vector processing, effectively doubling the data throughput of VPX5. While maintaining backward compatibility with previous VPX processors, VPX6 delivers a scalable, high-efficiency solution to meet this new generation of AI and embedded system demands.
The central improvement of ARC VPX6 is its introduction of 1024-bit vector Single Instruction, Multiple Data (SIMD) processing to significantly increase computational efficiency. By leveraging a SIMD architecture, VPX6 applies a single operation across multiple data points simultaneously, reducing the number of required compute cycles. Compared to the 512-bit processing of VPX5, which executes 64 single-precision floating-point operations per cycle, VPX6 executes double the operations, with 128 per cycle. The result is a major performance boost for workloads like image filtering, batch Fast Fourier Transform (FFT) in radar, AI pre-processing, and sensor fusion.
Beyond raw compute power, ARC VPX6 also features a sophisticated Direct Memory Access (DMA) engine that unlocks a continuous flow of data to the processor to prevent idle cycles and maximize throughput. With support for double buffering, VPX6 effectively hides memory latency to maximize computational efficiency.
Consider that vector processing is subject to . How far can one take the parallelization, and at which stage is it better to distribute the task to multiple cores that work in parallel? The answers to these questions depend on the application-specific workload. But, thanks to VPX6, designers are given more options from which to choose.
With full backward compatibility with VPX2, VPX3, and VPX5, the ARC VPX6 DSP IP is designed to easily integrate with new or existing designs. For example, we offer vector-length agnostic libraries that guarantee legacy software for previous VPX processors can run on VPX6 without modification. Such compatibility reduces development cycles and enables an easy upgrade path for developers to scale their solutions without reworking existing codebases. The vector-length agnostic approach enables complete product families that serve multiple performance requirements (i.e., low- to high-end) based on the same software infrastructure.
Like all members of the VPX family, VPX6 is highly configurable. For instance, designers can easily define parameters such as the number of registers, the size of L1 and L2 memory, or whether to integrate math and FFT accelerators into their architecture. Such flexibility lets engineers match their architecture to their application, effectively eliminating area and power overhead.
For further scalability, VPX6 is available in multiple configurations: Single-core (VPX6), dual-core (VPX6x2), and quad-core (VPX6x4). These fully integrated, multicore solutions come with shared DMA engines, memory coherency mechanisms, synchronization support, and runtime libraries. This allows developers to optimize performance and power efficiency based on specific workload demands.
All members of the VPX family leverage patented ARC Processor Extensions (APEX) technology, which allows designers to create user-defined scalar and vector instructions. It also enables the integration of custom hardware accelerators that improve application-specific performance while reducing power consumption and the amount of memory required. VPX processors also benefit from a rich set of pre-optimized libraries for DSP, linear algebra, and vision processing, including vision kernels tailored for OpenCV-style operations. These include ready-to-use software functions for color conversion, solvers, edge detection, object tracking, matrix transformations, and FFTs, to name a few.
New features designed for VPX6 — including enhanced DMA, which handles long memory latencies, as well as support for OCP-MX, a new industry standard for the compact storage of data for AI applications — will also be extended to the entire VPX family.
With so many benefits to the designer, the ARC VPX6 is a perfect solution for many real-time, high-performance signal processing applications:
The 草榴社区 ARC VPX6 DSP IP takes digital signal processing to the next level. By doubling the performance of VPX5, it reduces compute cycles and power consumption while maintaining full software compatibility for easy adoption. With greater scalability and efficiency, VPX6 gives designers more flexibility to optimize power, performance, and area (PPA) for their specific workloads.
Moving forward, we will continue to advance the ARC VPX processor family to meet the evolving demands of AI and embedded systems. Through industry-leading tools, optimized libraries, and pain-free integration options, we are empowering developers to push the boundaries of high-performance signal processing.