草榴社区

Creating an AI Accelerator Chip in 18 Months with Neuchips

Cl Chen, Kevin Wei, Kinny Chen, Rich Collins

Aug 31, 2022 / 4 min read

AI-powered recommendation applications are opening up new avenues to enhance the customer experience. With this technology, online stores can highlight other items to add to digital shopping carts, digital music services can suggest songs based on tunes already in the rotation, and social media channels can offer up content that might fit the user’s interests. When these systems work seamlessly and deliver accurate suggestions, they can also bring more dollars to the bottom line. However, a significant amount of challenging engineering work goes on behind the scenes to produce accurate recommendations.

AI accelerators are a critical part of the technology stack for recommendation systems. Their speed and energy efficiency, as measured in inferences per Joule of energy, are key to their prediction accuracy. In 2019, Meta (then Facebook) called on the industry to work on hardware acceleration for recommendation system, based on its . That call to action inspired the engineering team at Neuchips Inc. to rally around this problem of providing increased recommender model capacity that scales in an Open Compute Project (OCP) form factor. In the race to meet Meta’s request, the young company announced this summer that it has taped out its first DLRM accelerator, the , in Taiwan.

Designed for data center recommendation models, the RecAccel?-N3000 has achieved one million DLRM inferences per Joule of energy (which translates into 20 million inferences per second per 20-Watt chip). The AI accelerator, developed with support and EDA tools from 草榴社区 and other semiconductor industry leaders, will be manufactured on TSMC’s 7nm process, with the sample plan scheduled to be ready at the end of 2022.

In this blog post, we’ll provide more details about how Neuchips, with a team of about 30 engineers, was able to tape out its 400mm2 AI chip in just 18 months, a process that would typically require more than 100 engineers over the course of 3 to 4 years. Another opportunity to learn more about Neuchips comes during its presentation, “Design of a High-Efficiency Accelerator for Full-Scale Deep Learning Recommendation Models (DLRM) in the Datacenter,” at the upcoming ARC? Processor Summit 2022 on Thursday, September 8, at the Santa Clara Marriott. The company’s session is scheduled from 2:35p.m. to 3:20p.m. PDT.

Over the shoulder view of a young woman choosing food from the menu on mobile app

Direct-to-ASIC Approach for Data Center Inference

AI recommendation systems, especially DLRMs, are the dominant machine learning application when it comes to cloud resource usage. Novel adaptations of DLRMs are generating more useful predictions, while requiring more compute capacity within fixed energy and space constraints. Neuchips is pioneering a unique “direct-to-ASIC” engineering approach that accelerates software with a purpose-built, domain-specific AI accelerator plus co-designed compiler and runtime software. In the company’s asynchronous, heterogeneous dataflow architecture, each type of IP and processor is carefully tailored to optimize a component of the DLRM logical architecture. The configurable 草榴社区 ARC? processors, with their low power consumption and high performance, play an integral role in the groundbreaking performance of the RecAccel?-N3000.

Other features of the RecAccel?-N3000 include:

  • 160MB on-die SRAM
  • 4×64 LPDDR5 with inline error correction code (ECC)
  • Up to 128GB on card DRAM
  • Up to 16 lanes of PCI Express? (PCIe?) 3.0, 4.0, and 5.0
  • Embedded secure hardware root-of-trust module

Striving to get to market first, Neuchips sought support, design and verification tools, and IP that could help the company accelerate its design cycle. It found what it needed through the , a joint effort between 草榴社区 and the Industrial Technology Research Institute (ITRI) in Taiwan. Many on the team were already familiar with 草榴社区 technologies, which made it an easy decision to collaborate with 草榴社区 on the ambitious project.

The AI Chip Design Lab is located at ITRI headquarters in Hsinchu, Taiwan. It receives support from the Technology Development Programs of the Department of Industrial Technology (DoIT) and the Ministry of Economic Affairs (MOEA) in Taiwan. The lab aims to help the country’s semiconductor industry advance through access to the latest design tools and design and verification services. One of the key offerings of the AI Chip Design Lab is a 草榴社区 system-level solution based on the ARC AI Reference Design Platform, spanning architecture design to virtual prototyping and system verification. The design platform is intended to help lower the barrier of entry into AI and to shorten design cycles.

Reducing Chip Development Time by More than One Year

Based on their unique characteristics, DLRMs can be difficult to accelerate with general-purpose AI accelerators. Neuchips developed its RecAccel?-N3000 with tailored hardware IPs that accelerate embedding tables, matrix multiplication, and feature interaction. Working with 草榴社区 to implement early hardware/software co-development enabled by the ARC AI Reference Design Platform, Neuchips was able to save more than one year in chip development time. With the design platform, the team was able to develop and verify both the RecAccel?-N3000 domain-specific AI accelerator’s PCIe 5.0 subsystem and its LPDDR5 subsystem early and then integrate them into the whole chip. 草榴社区 ZeBu? Server 4 emulation system in the cloud was used to verify the subsystems as well as the entire RecAccel?-N3000.

The RecAccel?-N3000 leverages an array of 草榴社区 IP blocks, including:

Using silicon-proven 草榴社区 IP helped the Neuchips team reduce integration risks and contributed to a shorter design cycle. 草榴社区 application engineers also supported Neuchips in optimizing the code for its cloud-based chip design, configuring the IP, and with simulation and verification on the FPGA-based ZeBu Server 4 system, which accelerated full ASIC RTL simulations from two weeks down to about 20 minutes.

Other design and verification tools that played a part in the development of the RecAccel?-N3000 include 草榴社区 Design Compiler RTL synthesis solution, 草榴社区 VCS? functional verification solution, 草榴社区 SpyGlass? static and formal verification platform, 草榴社区 Verdi? automated debug system, 草榴社区 Formality? equivalence checking, 草榴社区 PrimeTime? static timing analysis tool, 草榴社区 PrimePower RTL to signoff power analysis tool, and 草榴社区 IC Compiler? II place-and-route solution.

Summary

With recommendation systems becoming both more prevalent and more insightful in our digital world, Neuchips’ RecAccel?-N3000 comes at a good time. By accelerating recommendation inference for data centers, the high-performance, energy-efficient, and scalable AI platform is poised to help a variety of industries personalize the customer experience online. Working closely with 草榴社区, ITRI, and others in the Taiwan semiconductor ecosystem, Neuchips Inc. has achieved the fast time-to-market needed to get a head start in the race to deliver impactful AI solutions.

Continue Reading