Cloud native EDA tools & pre-optimized hardware platforms
By: Ron Lowman, Product Marketing Manager, 草榴社区
Over the past decade, a few advancements have made artificial intelligence (AI) one of the most exciting technologies of our lifetime. In 2012, Geoffrey Everest Hinton demonstrated his generalized backpropagation neural network algorithm in the Imagenet challenge, which revolutionized the field of computer vision. However, the math was developed years prior to 2012, and it was the available microprocessors, like the Nvidia GTX 580 Graphics Processor Units, that enabled this milestone. These processors had relatively high bandwidth to memory capabilities and were very good at matrix multiplications, reducing the AI training time of this neural network model to about one week. This combination of mathematics and processing capability has recently set in motion a new generation of technology advancements with an entire new world of possibilities related to AI. This article outlines the new era of AI design and its diverse processing, memory, and connectivity needs.
Neural networks are what we define as deep learning, which is a subset of machine learning, which is yet a subset of AI, as shown in Figure 1. This is an important classification because it isn’t AI or more specifically machine learning that is changing the system-on-ship (SoC) architecture designs but it is the subset known as deep learning that is.
Figure 1: AI mimics human behavior using deep learning algorithms
Deep learning is not only changing the makeup of SoCs but spawning a new generation of investments in the semiconductor market. Deep learning algorithmic models, such as convolutional neural networks (CNN), are heavily used in both the R&D community and commercial investments. CNNs have been the primary focus for machine vision. Models such as, recurrent neural networks have seen applicability into natural language understanding because of its ability to recognize time.
Deep learning neural networks are used in many different applications giving powerful new tools to those who leverage them. For example, they enable advanced security threat analysis, predicting, and preventing security breaches as well as helping advertisers identify and streamline the sales process by predicting the process potential buyers follow. These are two examples of data center applications that run on server farms featuring the latest GPU and AI accelerator semiconductor technologies.
But AI designs are not contained within the data center. Many new functions such as vision systems for object and facial detection, natural language understanding for improved human machine interfaces, and context awareness enable an understanding of what activities are taking place based on a combination of sensor inputs. These deep learning capabilities are being added to SoCs in all markets including automotive, mobile, digital home, data center, and Internet of Things (IoT), as shown in Figure 2.
Figure 2: AI capabilities have been added to a wide range of applications
The mobile phone utilizes neural networks for many of the AI functions described above. The phone is running a facial recognition app, an object identification app, a natural language understanding app. In addition, it is internally using neural networks for 5G self-organization as the wireless signals become denser, over many additional mediums, many different spectrums, and with differing priorities of the data transferred.
Deep learning has only recently been made feasible via advancements in both mathematics and semiconductor hardware. There are several efforts to better replicate a human brain in next generation math models and semiconductor architectures. This is often referred to as neuromorphic computing. The human brain is incredibly efficient, and technology is only beginning to scratch the surface on replicating a human brain. The human brain incorporates over a petabyte of memory storage and is equivalent to about 540 trillion transistors at a power footprint less than 12 watts. At this point, replicating the brain is a stretch goal. However, the ImageNet challenge has progressed from the first back propagation CNN algorithm in 2012, to a more advanced AI model called ResNet 152 in 2015 that has an error rate better than humans. The market is moving quickly, with new algorithms published often and semiconductors rapidly integrating the needed features to outpace their competitors.
There are several critical changes to SoC architectures where deep learning capabilities are incorporated. These design modifications impact both highly unique solutions and more general purpose AI SoC designs and include specialized processing needs, innovative memory architectures, and real-time data connectivity.
SoCs adding neural network capability must accommodate both heterogeneous and massively parallel matrix multiplication. The heterogeneous component requires scalar, vector DSP, and neural network algorithm capabilities. Machine vision, for example, requires individual stages each of which require different types of processing, as shown in Figure 3.
Figure 3: Neural network capabilities require unique processing
The pre-processing requires more simple data-level parallelism. The precise processing of selected areas requires more complex data-level parallelism that can be efficiently tackled with dedicated CNN accelerators with good matrix multiplication capabilities. The decision-making stages can commonly be handled with scalar processing. Each application is unique, but what is clear is that heterogeneous processing solutions, which also include acceleration of neural network algorithms, are required to handle AI models efficiently.
AI models use a significant amount of memory, adding cost to the silicon. Training neural networks can require gigabytes to tens of gigabytes of data, creating a need for the latest in capacity requirements offered in DDR. As an example, VGG-16, which is an image neural network, requires about 9 GB of memory to train. A more accurate model, VGG-512, requires 89 GB of data to train. To improve the accuracy of an AI model, data scientists use larger datasets. Again, this either increases the time it takes to train the model or increases the memory requirements of the solutions. Due to the massively parallel matrix multiplication required and the size of the models and number of coefficients needed, external memories are required with high bandwidth accesses. New semiconductor interface IP such as High Bandwidth Memory (HBM2) and future derivatives (HBM2e) are seeing rapid adoption to accommodate these needs. Advanced FinFET technologies enabling larger arrays of SRAM on-chip and unique configurations with custom memory-to-processor and memory-to-memory interfaces are being developed to better replicate the human brain and address the memory constraints.
AI models can be compressed. This is a required technique to ensure the models can operate on constrained memory architectures found in SoCs at the edge in mobile phones, automobiles, and IoT applications. Compression is done using techniques called pruning and quantification without reducing the accuracy of the results. This enables traditional SoC architectures, featuring LPDDR or in some cases no external memory to support neural networks, however, there are power consumption and other tradeoffs. As these models are compressed, the irregular memory access and irregular compute intensities increase, prolonging the execution time and latency of the systems. Therefore, system designers are developing innovative, heterogeneous memory architectures.
Once an AI model is trained and possibly compressed, it is ready to execute with real-time data through many different interface IP solutions. For example, vision applications are supported with CMOS-image sensors and connected via MIPI Camera Serial Interface (CSI-2) and MIPI D-PHY IP. LiDAR and radar can be supported via several technologies including PCI Express and MIPI. Microphones transmit voice data through connections such as USB, Pulse Density Modulation (PDM), and I2S. Digital televisions support HDMI and DisplayPort connections to transmit video content that can be improved after transmission with neural networks enabling super image resolution that produce higher quality pictures with less data. Many if not most TV manufacturers are looking at deploying this technology.
Hybrid AI systems are another concept that is expected to see more adoption. For instance, a heart rate algorithm identifies anomalies with AI, even false positives, on a fitness band that sends the information to the cloud for a more accurate in-depth AI neural network analysis of the anomaly for proper action. This type of technology has already been successfully deployed in the balancing of loads for electrical grids, especially in the case of downed power lines or unexpected heavy loads. To support a fast, reliable network to the cloud, Ethernet connectivity is required in aggregators in the above examples.
Although, there is a long way to go to replicate the human brain, the human brain has been used as an effective model to build AI systems and continues to be modeled by leading research institutions worldwide. The newest neural networks attempt to copy the efficiency and computing capabilities. SoC architectures are also just beginning to replicate human brains by tightly coupling the processors and memory. ARC Subsystems include the processing capabilities needed for AI with their APEX extensions and pervasive RISC architecture. The Subsystems tightly couple both the peripherals and the memories to the processor to address the critical memory bottlenecks.
AI, specifically deep learning neural networks, is a once in a lifetime technology development. It has been fast-tracked from a combination of innovations in neural network algorithms and innovations in high-bandwidth, high-performance semiconductor designs.
草榴社区 is working with many of the leading providers of AI SoCs across the world in every market segment. This experience has proven valuable for the adoption of proven, reliable IP solutions that lower risk, expedite time-to-market and enable critical differentiations for AI designers.
草榴社区 provides many specialized processing solutions, many options from memory interface IP to on-chip SRAM compilers with TCAMs and multi-port memory to address memory bottlenecks, and a full portfolio of connectivity options for real-time data. These IP solutions are critical components to next generation AI designs.
For more information: