草榴社区

Ensuring Reliability in HAPS Physical Prototyping Systems

Neil Songcuan

Mar 16, 2021 / 8 min read

Introduction

Physical prototyping systems are not new, but the packaging, components and price points vary greatly. Given the large investment in these systems, it’s necessary to develop and qualify the hardware and supporting components for integrity and reliability. The first part of this three-part series focuses on the hardware characterization of HAPS physical prototypes and how performing this step ensures hardware stability and reliability. The characterization of the hardware is a natural progression in the co-development of an integrated prototyping solution that requires the prototyping implementation solution to know detailed hardware information. Long before the first production HAPS system is built, 草榴社区 performs hardware characterization, which includes examples of functional specification and application tests with objectives, results, and analysis. Characterization includes the analysis of HapsTrak standard daughter boards, interconnect cables, system boards, power supply modules, and system cooling to provide guidance on how performance can be influenced. The second topic will look into the HAPS system test methodology and high-availability objectives to ensure that all HAPS systems are reliable and capable of maintaining high availability before they are shipped to customers. And finally, the last topic will dive into how the prototyping system supervisory function allows for the management of multiple systems, voltage/thermal monitoring, and detection of faults.

So, what goes on behind the scenes of the design, specification, and characterization of a modern prototyping system? The fact is, not all prototypes are built the same. Some are built and customized for a specific design or project, while others are built to cater to any design size, ranging from IP to large scale SoCs. Physical prototypes are no longer only a printed circuit board with commercial FPGAs, interfaces, and generic I/O connectors to provide connectivity to the real world external stimulus. They are designed with rigorous integrity and fidelity to support 24x7 uptime in server farms, with accessibility from anywhere in the world. Projects cannot afford to have a system go down and risk project schedules. System downtime and equipment returns are mitigated by hardware characterization of each and every HAPS system component, combined with extensive testing, live-system monitoring and fault detection utilities that prevent the system from being damaged by over/under voltage/temperature conditions.

The reliability objectives for the HAPS systems have been refined over many years and multiple generations. They include:

  • Minimizing system downtime
  • Reducing equipment returns due to damage
  • Minimizing the need to troubleshoot system assembly and prototyping utility IP
  • Consistent and predictable function and performance characteristics across individual system units

Validating Interconnect Performance: HapsTrak 3

HapsTrak 3 is a mechanical and electrical standard for attaching daughter boards and interconnects to HAPS systems based on the Xilinx Virtex-7 and UltraScale FPGAs. Motherboards contain the Samtec SEARAY open pin field interconnects with SEAF "socket" type HapsTrak 3 connectors and mechanical frame that allows fastening of a daughter board. A board that has SEAM or SEAM-RA "terminal" type HapsTrak 3 connectors typically contains peripheral or interconnect circuits with physical interfaces such as DDR3, ETH, QSFP, PCIe, and GPIO.

The HapsTrak 3 SEARAY implementation combines single-ended and differential pair signaling. Single-ended (SE) performance of SEARAY is rated at up to 12.5 GHz at a 3dB insertion loss and differential pair (DP) performance is rated up to 13.0 GHz at a 3dB insertion loss. These performance characteristics are necessary to support the variety of signal standards supported by the Xilinx Virtex-7 and UltraScale SelectIO resources.

草榴社区 conducts signal integrity validation of the HapsTrak 3 and circuit board layout for a variety of SE and DP signal standards to quantify maximum performance data. The example below illustrates the HapsTrak 3 SE signaling performance characterized by 草榴社区. The characterization data is then used for co-development purposes combined with 草榴社区’ HAPS ProtoCompiler tool to ensure maximum performance is achieved on an IP or SoC design.

SSTL signal standard diagram with 566 ps eye opening at 1000 Mbps

Figure 1: Diagram illustrates signal quality of single-ended performance of SSTL signal standard at 1,000 Mbps and exhibits a clean 566 ps eye opening.

The total trip time of signals that originate from the general purpose FPGA I/Os to arrive at daughter boards inserted into one of the 24 HapsTrak 3 connectors is a composite of worst case FPGA delay, PCB Rx/Tx delay, and cable delay. This characterization conducted compares and measures the theoretical to measured speed in a global synchronous system with a HAPS system.

We conduct tests which show the measured pass/fail test results using specific HAPS-80 systems configured in two different setups. In both cases, HapsTrak 3 connector J1 of the transmitting FPGA is cabled to the receiving FPGA HapsTrak 3 connector J1 and so on. The first setup runs with the transmitter located at FPGA module device A and the receiver at FPGA module device B. The second setup has the transmitter in device D and the receiver in device C. Both setups are run in two different modes: aggressor/victim or random data pattern. The test is run in steps of 5 MHz. Figure 2 shows the difference between measured and theoretical speed.

Measured vs theoretical global synchronous performance

Figure 2: Measured versus theoretical maximum global synchronous performance

The 16 GTX transceiver channels of the Xilinx UltraScale VU440 FPGA of the HAPS-80 are available at two HapsTrak MGB (multi-gigabit) connectors per HAPS-80 PCB module. To measure the performance of the system, an IBERT design is combined with a HapsTrak 3 MGB loopback test daughter board. Xilinx ChipScope Pro software is used to sweep the “bathtub” and “eye” at various settings to confirm stable high-quality links. As an example, Figure 3 shows the minimum remaining data eye of the low-voltage differential signaling (LVDS) for a DDR memory interface.

Minimum data eye (LVDS) in DDR Memory Interface

Figure 3: Minimum remaining data eye (LVDS) – DDR Memory Interface

Validating Performance of Power Modules, Heating and Cooling

The HAPS power modules that drive the distributed power rails across system and daughter boards must address the requirements of the ICs and support circuitry across a range of environmental and system clock speeds. If the power module cannot supply adequate current the system can fail to function or exhibit erratic behavior when supply rail voltage or current fall out of the minimum (or maximum) ratings of the system ICs. In particular, this is where characterization is critical, especially if these prototype systems will be used in a server farm capacity where uptime and reliability is critical. An optimized design, guided by requirements and verified by characterization helps ensure HAPS system stability and minimizes ripple and noise of power rails. This helps to ensure signal integrity of interconnect signals, and support a high number of I/O that may switch simultaneously. Decoupling scheme simulation results are applied to optimize the PCBs power distribution system (PDS) impedance across a wide range of frequencies.

Power requirement analysis starts with the software estimation tool provided by Xilinx, called Xilinx XPower Estimator (XPE). A variety of device utilization and switching scenarios are examined to determine static and dynamic power requirements for the various FPGA core and I/O power supply inputs. As an example, Figure 4 illustrates a power estimation scenario using Xilinx XPE with the following design conditions:

  • 25 percent FF and LUT utilization
  • 100 percent BRAM utilization
  • 100 percent PCIe block utilization
  • 12.5 percent toggle rate at 400 MHz
  • GTX I/O running at full speed
  • 900 FPGA I/O toggling at 1,600 Mbps
  • Differential HSTL Class II DCI 1.8V
Example scenario of Xilinx XPower Estimator

Figure 4: Example scenario of Xilinx XPower Estimator

In this scenario, the power supply must provide over 50 amps for the FPGA’s VCCINT supply rail and 31 amps for the VCCO 1.8V I/O supply rail. Based on the current and voltage requirement for major circuits like FPGAs and HAPS daughter boards, various power supply candidates are reviewed and approved by 草榴社区. Power Supply Unit (PSU) qualification involves performance testing of the following characteristics with AC input voltages of both 100V and 240V:

  • Electrical noise
  • Startup behavior
  • Load change behavior
  • Shutdown behavior
  • Over current protection

A variety of PSU manufacturer models are assessed using the test setup shown in Figure 5. A programmable electronic load is applied to the PSU and monitored by an oscilloscope for voltage and current conditions. A transformer provides AC voltages of 100V and 240V. Figure 6 illustrates an over current protection scenario where the unit under test (UUT) is in a continuous operation mode, the electronic load transitions from a static load with nominal maximum current demand to a maximum load demand of 400 amperes using a fast step ramp.

PSU test setup

Figure 5: PSU test setup

Example analysis of over-current protection

Figure 6: Example analysis of over-current protection

PSU candidates that pass screening will then be attached to a power module verification board to verify power rail operation of a HAPS system at various loads. Voltage ripple, startup limitations, and step load conditions are confirmed to operate within specifications of the system.

A large clock edge style counter is implemented into a HAPS system, to maximize the dynamic power consumption of each FPGA and clock frequency is increased until the FPGA safe operating range is exceeded. Die temperature, VCCINT supply, VCCAUX supply, and VCCBRAM voltages are measured during this high transient state. An example sweep is shown in the plots of Figure 7.

To measure cooling performance of the heat mitigation design of the HAPS system, a test loads all FPGAs with the counter design and clock frequency increased to the maximum range to measure current load, wattage, and die temperatures, see Table 1.

 

Clock (MHz) Load @12V (A) Total Effect (W) Max die temperature for
device A, B, C, D (degrees C)
100 30.0 360 72.0, 70.5, 78.0, 77.0
110 31.9 383 73.5, 72.0, 79.5, 78.5
120 33.3 400 75.0, 73.0, 81.5, 80.0
130 35.2 421 75.5, 73.5, 81.5, 80.5

 

Table 1. Cooling characteristics at high speed system operation

Cooling performance variations across devices are due to variation of fan performance, slight mechanical differences, and placement of the HAPS System Supervisor module. All system cooling tests comply with the requirements of the commercial grade FPGA devices rated at maximum of 85 degrees C.

VCCINT temperature and voltage at high transient load

Figure 7: VCCINT temperature and voltage at high transient load

Summary

With each new generation of HAPS, we ensure that hardware characterization is considered during the feasibility study and architectural phase of new system development. By considering system characterization early in the design process, we are able to ensure high performance and reliability of our HAPS systems and guarantee they meet the physical prototype requirements for today’s SoC and ASIC designs. Hardware characterization is just the first step, which is supported by a comprehensive test methodology to ensure that all HAPS systems are reliable and capable of maintaining and passing characterization guidelines. Look out for part 2 of this article series where we will look into the HAPS system test methodology and high-availability objectives that we follow to ensure that all HAPS systems are qualified for delivery before they are shipped to customers.

Continue Reading