草榴社区

Production Test of System-in-Package with Die-to-Die PHY IP

Manuel Mota, Sr. Staff Product Marketing Manager, 草榴社区

Introduction

A key challenge facing the semiconductor industry is its inability to catch product defects early in the production phase. The cost (economic and reputational) associated with deploying a defective product to market is very significant. This is especially true for high-performance computing system-on-chip (SoC) designers that are targeting hyperscale data center, networking, and AI applications, since any product defect can have catastrophic impact on AI workloads or data processing.

The semiconductor industry has developed an array of test methodologies to improve the speed and coverage of production tests. These methodologies have been standardized to improve efficiency by using common testing metrics and interfaces at different stages of the end-product manufacturing ? from wafer testing to chip testing to board-level testing.

This article describes how efficient production test of system-in-packages (SiPs) using die-to-die PHY IP can ensure the end-product is not defective and the production yield is kept as high as possible. The article describes how die-to-die PHY IP internal test features can extend the test coverage across all dies.

SiP Testing Challenges

There is renewed interest in integrating multiple dies into the same package. This trend is driven by two factors: growing complexity and SoCs that are becoming too big in size for cost-effective monolithic integration and flexibility advantages of implementing different SoC functions in the process node that makes more technical and economic sense.

A SiP is a chip that assembles several dies (or “chiplets”) in a single package. These can either be replicas of the same chiplet to enable increased system performance or different chiplets that cost-effectively bring more functionality to the system.

Often, the chiplets are from different vendors and are assembled together in the same package. As shown in Figure 1, modern 2.5D or 3D packaging technology assembles several dies in a complex way, making use of (simpler) organic substrates or (more complex) silicon interposers, silicon bridges, and through silicon vias (TSVs) to route the signals between the dies and to the package periphery.

Figure 1: Different packaging technologies with distinct routing features 

Individual dies, package “structures” (interposer, TSVs, bumps), and the package assembly can suffer from yield limitations. Even if the yield of each individual element is relatively high, the total SiP yield, which is the cumulative yield of all the different elements, can be prohibitively low, as seen in the following formula:

Yield SiP  = Yield NDie x Yield Package x Yield Assembly 

where N = number of dies assembled in the same package.

As an example, a SiP with 4 dies, each with a yield of 90%, and a package and assembly yield of 100% has a total SiP yield of only ~65%. For large dies in advanced process nodes, an individual yield of 80% may be good, but the resulting SiP yield may be prohibitively low at approximately 41%. Basically, a defect in one die invalidates the complete SiP, including the remaining three non-defective dies.

To improve yield, companies follow two directions:

  1. Identify and assemble only known good dies (KGD) in the package. In this case, the total SiP yield, in the above example, becomes equivalent to the single die yield.
  2. After assembly, validate functionality that spans across dies for detecting defects from the assembly process and other defects that may be too hard to identify by testing single dies (as an example, a defective bump may not be detected during single die testing).

Bypassing or otherwise overcoming identified defects can also help improve yield by implementing test and repair functionality at die level and at assembled system level. Such test and repair functionality can include redundancy or other schemes and is particularly useful for large, regular structures, such as memory or very wide busses across dies.

Given the complexity of SiP testing, with dies coming from different sources, standardizing test infrastructures and methodologies across the ecosystem is critical to the success of the SiP and chiplet ecosystem. The IEEE and other standards organizations are stepping up with new test architecture standards for 3D packaged dies.

SiP Testing Architecture

As an example, the recently published IEEE 1838 defines a standardized modular test access architecture for SiP products that enables system designers and test engineers to efficiently validate their products, as seen in Figure 2.

Figure 2: IEEE 1838 test access architecture for testing individual naked dies, assembled dies, and packaged SiP 

IEEE 1838 builds up on existing test standards for monolithic SoCs, such as IEEE 1149.1, IEEE 1500, and others to define a test architecture that manages isolated dies and assembled dies testing, achieving complete die and die-to-die functional blocks test coverage with only minimal additional test circuitry.

IEEE defines a serial port for test control and low-speed test data access (based in IEEE 1149.1), which is implemented in each die and is accessible even after the final assembly, and an optional parallel test access port that may not be accessible after assembly. These ports only use a reduced set of test bumps for non-assembled die testing or are seamlessly attached to the corresponding port in the other die, extending the test infrastructure to cover intra-die or inter-die testing after assembly.

In addition, IEEE defines a hierarchy of testing, dividing the effort between intra-die testing for KGDs, inter-die testing for post-package assembly, and for package assembly itself, as shown in Figure 2.

Inside each die, additional test hierarchy can be defined, following established methodologies to test digital logic blocks, memory blocks, and others with scan chains and built-in self-test (BIST) structures. Testing the digital connections between dies is based on boundary scan chains.

High-speed analog blocks testing is often based on functional testing, however, they can also integrate in the test management hierarchy by adding suitable test wrappers that interface with the test infrastructure, as shown in Figure 3.

Figure 3: Test architecture hierarchy inside the chiplet, including wrappers for integration of high-speed analog block test features in the overall test infrastructure

To enable test automation and reduce test time, the high-speed analog block, for example a high-speed PHY IP, must provide adequate test coverage. This becomes more challenging when considering high-speed die-to-die links. For these cases, the complete link including the PHYs on both dies, the associated bumps, and the package link, need to be tested, relying on the test infrastructure built into the high-speed PHY.

The high-speed PHY for die-to-die connectivity must include a number of design-for-test (DFT) functionality:

  • Scan chains for static and at-speed detection of faults (stuck at, opens, slow path/transition) in digital circuitry
  • Built-in self-test (BIST) functionality, when possible, for specific digital and analog blocks
  • Internal loopbacks to test the individual PHY; These loopbacks can be shallow (covering the digital circuitry) or deep (covering all of the transmit and receive signal path up to bump or as close to the bump as possible without avoiding mission mode performance impacts)
  • Pattern generators and matcher supporting pseudo-random patterns or specific patterns
  • Ability to scan reference and phase to generate pass/fail eye diagrams to determine design margins
  • External loopbacks from one die to the next extending test coverage to the bumps and the die-to-die trace, as shown in Figure 4.

Figure 4: Die-to-die PHY implementing internal and external loopbacks

Known Good Die Testing

A mandatory initial step is to identify defective dies before assembly in the SiP, so only KGDs are assembled, significantly improving the overall production yield.

KGD testing is performed on the naked die, prior to packaging. For an IEEE 1838 compliant die, a standard serial and parallel test access ports are used for access to the complete test infrastructure of the die via a reduced set of test bumps.

The test features within the analog blocks such as the high-speed PHY IP are also interconnected with the die test infrastructure by an IEEE 1500 compliant wrapper to also allow PHY testing.

Depending on the die’s built-in test capabilities and the individual blocks in the die, the test coverage can be very high, ensuring a KGD is correctly identified. However, even in the best test coverage scenarios, there are items that cannot be adequately covered at the naked die level. For example, faulty bumps or the last stages of sensitive output drivers and first stages of low-noise amplifiers that could not be included in the high-speed PHY’s deep loopback are not covered. Other examples include functions that straddle the two dies such as a control loop.

Extending coverage to such missing items, as well as to the inter-die connections is executed in the test strategy next steps that are performed on the assembled SiP.

Assuming both dies are IEEE 1838 compliant, the dies' test infrastructure is seamlessly merged into a single structure assessed at the test ports in a single (the “first”) die and extended to the next die via secondary test ports.

It is now possible to launch tests, such as boundary scan EXTEXT for digital pins and across die loopback tests for high-speed PHYs, extending the test coverage to the periphery of the dies and on the package itself.

Additional Yield Improvement Strategies

It is relevant to note, in some special cases, the hierarchical test methodology described above may not be enough to improve yield to the required levels.

Consider a wide parallel interface between two dies: for example, high-bandwidth memory (HBM) between memory and digital chip or high-bandwidth interconnect (HBI) / advanced interface bus (AIB) between two digital chips. These interfaces may have thousands of pins using micro-bumps and very dense traces on an interposer to connect between them. In this case, the substrate traces or micro-bumps yields may be low enough, resulting in the loss of KGDs. For these cases, a complementary test & repair strategy, relying on redundant pins on each PHY and the corresponding redundant micro-bumps and traces, enables additional yield recovery after final product assembly.

Conclusion

As market demand grows for integration of multiple dies into the same package targeting high-performance computing applications and many others, the testing of the dies (pre- and post-assembly) has become critical to achieving the required yield. A standard-based on-die test infrastructure must enable extended test coverage at the naked die level and assembled SiP. Die-to-die interfaces play an important role in the test strategy as their function straddles both dies that make up the link. The die-to-die PHY IP must include test functionality that simplifies its test at the naked die level and the link itself after assembly, while integrated in the chip test infrastructure.

草榴社区 provides a portfolio of die-to-die PHY IP for USR/XSR and HBI links. The embedded bit error rate (BER) tester and non-destructive 2D eye monitor capability provides on-chip testability and visibility into channel performance. With available IP in advanced FinFET processes, along with all the necessary analysis and reports for easy integration, 草榴社区 gives designers the comprehensive support they need to accelerate high-performance computing SoC designs for hyperscale data center, networking, and AI applications.