草榴社区

Designing AI Chips: A Conversation with Thomas Andersen

草榴社区 Editorial Staff

Jul 09, 2021 / 4 min read

Thomas Andersen

What Does it Take to Design AI Chips?

We sat down with Dr. Thomas Andersen, Group Director for 草榴社区 AI and Machine Learning, who recently spoke at the AI Hardware Summit in Mountain View, California, to get his views on how artificial intelligence and machine learning are impacting SoC design.

Q: How long has 草榴社区 been involved in AI hardware?

Thomas Andersen:

As a partner to the world’s most innovative companies, we’ve worked alongside AI pioneers from the very beginning, which has given us a lead in learning about AI hardware design needs, and allowed us to make the necessary investments to address their requirements—from reference flows, to new features in our tools, to the industry’s most comprehensive AI-ready DesignWare? IP portfolio. Today, practically all AI accelerators in data centers worldwide were designed and verified with 草榴社区 products.

Q: What has 草榴社区 learned about the design implications and architectural characteristics of AI accelerators?

Thomas Andersen:

The need for AI acceleration is driven by two tasks: training and inference. While both require maximum speeds, they present different requirements. Latency may not be a huge concern for training, however, once the trained model is deployed, latency becomes a critical criterion. For example, object detection in an autonomous vehicle application must happen within milliseconds.

Power is important in both training and inference, but for different reasons. The power problem is expressed in terms of cost per watt for an AI accelerator in the data center. One of the reasons Google built its Tensor Processing Unit (TPU) is because it’s significantly more efficient on a performance-per-watt basis than GPUs or CPUs for typical AI processing loads. The power problem at the edge is expressed differently. Many edge applications are battery powered, requiring designers to focus on power-saving techniques to squeeze power consumption.

Q: Speed and power sound like competing requirements, are they?

Thomas Andersen:

They absolutely can be. There is a lot of algorithmic innovation in the AI space and therefore many diverse architectures are being proposed to optimize performance and energy-efficiency. Such architectures typically consist of highly parallel, largely replicated computation topologies, with millions of neurons to train. Rather than storing all this data in large central memories, thousands of high-performance memories must be distributed throughout the entire chip. These topologies, combined with the highly regressive nature of matrix computations of neural networks, create many pinch points within the chip design and verification process.

Q: How does 草榴社区 address the challenge of AI chip design?

Thomas Andersen:

It’s important to have a comprehensive approach and cover all aspects of the design—from accelerating algorithmic innovation, to quickly piecing together diverse architectures, to finally providing the best possible physical implementation all the way to manufacturing signoff.

Q: What do you mean by “algorithmic innovation”?

Thomas Andersen:

An application like object detection, for example, needs to map to a dataset and then a specific algorithm. At that point, a hardware accelerator architecture can be proposed and implemented. How can one know if it’s an optimal solution? Simulating three, sixteen-by-sixteen pixel images on a convolutional neural network (CNN) would take any RTL simulator to its knees. Therefore, the only way to explore AI architectures is to use virtual and hardware prototyping and emulation solutions. The 草榴社区 Verification Continuum’s Platform Architect virtual prototypingHAPS FPGA-based prototyping, and ZeBu hardware emulation solutions make architectural exploration feasible, and verification of the resulting extraordinarily large and complex implementations practical.

Q: Once you have selected the right AI architecture, how do you create an AI chip?

Thomas Andersen:

Each AI application has specialized compute, memory and interconnect requirements. Beyond the AI accelerator functionality, it will likely include a myriad of other components. For example, a data center device must have reliable and configurable connectivity to AI data centers, while an edge device will include real-time interfaces to sensors, images, audio, and more.

Memory selection is particularly critical to meeting the low-latency access requirements at low power. 草榴社区’ silicon-proven DesignWare IP portfolio is addressing the diverse processing, memory, and connectivity requirements of AI markets, including mobile, IoT, data center, automotive, and digital home. Processors manage massive and changing compute requirements for machine and deep learning tasks; memory IP solutions support efficient architectures for different memory constraints, including bandwidth, capacity, and cache coherency; interface IP solutions provide reliable connectivity to CMOS image sensors, microphones, and motion sensors for AI applications, including vision, natural language understanding, and context awareness.

Q: Don’t you still have to design the actual physical device?

Thomas Andersen:

Very much so, because this is the point where it all comes together. Densely populated computational units interspersed with thousands of high-bandwidth memories require AI designs to be built using high-performance, deep submicron (DSM) nodes at 16nm FinFET and below. Recent data center-oriented architectures are packing together more than 20 billion transistors and hundreds or thousands of processing modules at speeds that can exceed 5GHz. At the edge, designers are running 1GHz+ inference engines that need to operate in extreme temperature and voltage corners.

Achieving these goals in a physical device is a tremendous challenge and requires constant innovation alongside our design and foundry partners. Our 草榴社区 Design Platform has introduced several key AI-focused optimization technologies to achieve optimal power, performance, and area (PPA), particularly at these DSM process nodes.

Q: The AI revolution is disrupting many sectors of the global economy. How about chip design?

Thomas Andersen:

A recent study by NVIDIA, presented at this year’s 草榴社区 User Group (SNUG) conference in Silicon Valley, concluded that, despite amazing advancements in automation, about 70 percent of total design turnaround still requires manual input. Most of this time isn’t spent inventing novel features; it is spent piecing together complex design flows, managing trade-offs, and debugging large data problems.

We firmly believe that AI technologies, like machine-learning (ML), can help address such high-complexity, high-cost challenges not only for AI designs, but for all kinds of designs. For example, our recently announced  learns continuously to improve customer environments—a marked departure from traditional systems. AI-enhanced tools boost designer productivity by speeding up computationally-intensive analyses, predicting results that drive better decision-making, and leveraging past learning to intelligently guide debug.

Q: In closing, what do you see as the main challenge for the future of the AI chip industry?

Thomas Andersen:

We are at the very beginning of a race to find the next processing architecture. We’ve seen it before with CPUs, DSPs, and GPUs. The biggest challenge will be standardization, between both the AI frameworks for training models and the ability to map those models to these new architectures. This is an exciting space for us and we are executing on a multi-year 草榴社区 initiative for broadening investment in ML technology, alongside leading industry partners.

Continue Reading