Cloud native EDA tools & pre-optimized hardware platforms
Vadhiraj Sankaranarayanan, Sr. Technical Marketing Manager, 草榴社区
Graham Allan, Sr. Staff Product Marketing Manager, 草榴社区
Brett Murdock, Sr. Staff Product Marketing Manager, 草榴社区
Double Data Rate Synchronous Dynamic Random-Access Memory (DDR SDRAM or simply DRAM) technology is the de facto memory used in almost all applications today, ranging from high-performance computing to power/area-sensitive mobile applications. JEDEC, the standards body responsible for memory standards, has defined and developed DRAM categories of DDR standards (standard DDR: DDR, DDR2/3/4/5, mobile DDR: LPDDR2/3/4/5, graphic DDR: GDDR3/4/5/6, and high bandwidth DRAM: HBM, HBM2/2E/3), guiding designers to precisely meet their memory requirements. Figure 1 shows a high-level block-diagram of a memory channel in a System-on-Chip (SoC). It is typical for high-performance SoCs to have multiple memory channels. The simplified DDR memory shown below can be a DRAM memory from any of the above DDR categories.
Ensuring reliable and robust links in the memory channel requires the memory interface to be trained. This article outlines three different ways a DDR memory interface can be trained and focuses on the advantages of firmware-based training.
As shown in Figure 1, a typical memory channel consists of a DDR controller that interfaces with an SoC interconnect, such as an AXI interconnect. The DDR controller converts the incoming AXI transactions from the interconnect into DDR commands, and schedules the commands in an optimal fashion to be sent to the DDR memory through the PHY and the memory channel. The DDR PHY is a conduit between the controller and the DDR memory and plays a critical role for transferring the data reliably without any bit-errors between the controller and the memory. To ensure the DDR channel robustness during mission mode, the memory interface on the SoC and the DRAM are trained during initialization after power-up. At a high level, the training involves sending various patterns to the memory and exercising the channel by varying time delays and voltages for both Reads (RD) and Writes (WR), and then finding the optimal settings in both time/voltage domains for each of the RD/WR parameters. This is applicable to both command/address and data lanes, depending on the DDR standard and operation speed. Hence, one of the key requirements for a robust memory system is to train the DDR channel such that the channel has optimal signal integrity in both the time and voltage domains. As a result, the resulting data eyes at both the receivers in the memory interface on the SoC and those in the DRAM can handle the peak-traffic during mission mode.
There are three different ways a DDR memory interface can be trained:
The first option (i.e., CPU taking the responsibility to train the memory interface for every channel through SW or FW code) is very time-consuming since it takes away the precious CPU cycles for initializing other components.
The second option, although faster than the first, involves committing the training algorithms to HW state machines. Hence, it doesn’t have the flexibility that the other two options have when it comes to field-upgradability. Additionally, fixing any bugs in the HW often involves time and money to re-spin the SoC. This option is also design-intensive and consumes more area and power while supporting multiple DDR standards, since each of the standards may require its own custom algorithms and implementation. Finally, supporting complex data-patterns may not be feasible from the area and power perspective. Hence, the training patterns typically implemented in this scheme are often traditional, simpler patterns that toggle at a fixed frequency and do not excite many signal integrity affects such as cross-talk, inter-symbol interference, and jitter to the worst-case degree.
The third option, i.e. training by the PHY using FW code, is the most robust of all the three.
Firmware-based training combines the benefits of the first two options by allowing the training to take place through FW and localizing the training execution entirely to the PHY. This allows each of the memory channels on the SoC to be trained in parallel, and in addition, the host CPU can spend valuable cycles on other initialization activities, as the memory is getting initialized. Moreover, this fast and accurate training mechanism allows a common HW framework for training that can support multiple DDR standards. There is also flexibility in terms of the complexity of the training data-patterns, since the FW for each of the DDR standards can now be customized to have its own training data-patterns (for e.g., Pseudorandom Binary Sequence 23 or “PRBS23” vs PRBS31). Finally, this approach is also field-upgradable—a feature of great utility especially while supporting emerging standards, which may go through further revisions at JEDEC before they become well entrenched in the industry. It should now be obvious why 草榴社区 has chosen this method of training in our DesignWare? DDR PHY IP.
Figure 2 shows a comparison of the data eye between the second and third options. As can be seen, training through HW state machines involves using traditional data-patterns, which may not be fully representative of the mission mode traffic. Hence, HW state-machine-based training may result in an open eye (shown in white) during training but may not be robust in both directions during peak mission mode traffic. Clearly the HW state-machine-based training settings (white star) are not centered in a worst-case data eye (purple star), resulting in hold-time challenges. On the other hand, training through FW using sophisticated data patterns (such as pseudorandom binary sequences) typically results in a smaller eye (shown in purple) during training, allowing the trained voltage and time domain settings (shown as the purple star) to be optimized for more robust performance during mission mode traffic.
Figure 2: Results of HW vs FW based training in mission mode
Accurate memory interface training determines the robustness of the memory channel. Although there are three ways the memory interface can be trained, PHY training using firmware is the optimal mechanism of training since it is fast, accurate, and field-upgradeable. 草榴社区 uses this method of training in all of our DesignWare DDR PHY IP that require complex training, helping customers successfully achieve their memory interface performance targets.