草榴社区

Enhancing Arm SoCs Performance with Smart Monitors

VIP Expert

Feb 27, 2023 / 4 min read

草榴社区 Verification IPs

Highlights

  • AXI Bus performance plays a key role in overall system performance
  • Arm? AMBA? 3 AXI and Arm AMBA 4 AXI allow for multiple transactions to be processed without serialisation to improve performance
  • 草榴社区 Smart Monitors for Arm AMBA AXI interconnect is a best-in-class solution to measure multiple metrices required to optimize performance bus at pre-silicon
  • Intelligent sparse dumping to ensure minimal impact on payload runtime performance

In an SoC where AXI Bus is used to move a considerable amount of data, the performance of AXI Bus may become a bottleneck to your overall system performance. Increasing complexity and software content in SoCs is creating need to shift-left performance verification at pre-silicon using real life data payloads. Hardware-assisted verification platforms – 草榴社区 ZeBu? emulation system and 草榴社区 HAPS? FPGA prototyping system -are a necessity to run such large payloads.

How to Improve Throughput of AXI Bus

If a AXI Bus is being used for frequent bulk data transfer, achieving good throughput is very important. Throughput can be calculated by computing the sum of all data bytes (AxSIZE) in each beat (RVALID/BVALID) captured on the AXI interface during the observation window and dividing the sum by duration of observation window. A window showing low throughput generally doesn’t mean an issue, unless there was an expectation to move large amounts of data quickly. A few reasons for less throughput could be:

  • Manager behaviour: Ideally the manager should assert AWVALID and WVALID on the same cycle. Also, the manager should be able be drive multiple beats by keeping WVALID high on consecutive cycles. If this is not the case, then the manager is restricting throughput for write transactions.
  • Valid/ready handshaking: The best performance can be achieved if xREADY is always high on both the manager and subordinate side. However, real world DUTs must eventually de-assert xREADY when the internal pipeline is full. So, the manager/subordinate should ideally keep outstanding transactions within DUT pipelining limits to make sure there is no stalling.
  • Request to response latency: The subordinate might take few cycles to respond to a write/read request. Peak performance is achieved when the response is on the next cycle to when the request was sampled by sub-ordinate. However, complex interconnect routing and memory access often takes a few cycles before response can be driven. (Figure 3)

How to Improve Transaction Performance of AXI Bus

Arm AMBA 3 AXI and Arm AMBA 4 AXI interconnects support outstanding transactions without any limitation, even allowing multiple outstanding transactions with the same ID. The ID (or few bits of it) is often used to route response from the subordinate to the correct manager with a unique ID. If the manager can issue multiple outstanding transactions, it should be done so only if the subordinate supports it as well, otherwise it will simply deassert xREADY signals and lead to a stall. Even if subordinate supports outstanding transactions, it can only do so for as long as its internal pipeline is not full. So, optimal performance can be obtained if the manager issues outstanding transactions equal to or less than the pipeline depth of the subordinate, which allows the interconnect to process multiple transactions without any serialization.

Outstanding transaction count

Figure 2: Outstanding transaction count per observation window displayed in 草榴社区 Platform Architect

Verdi Performance Analyzer

Figure 3: Write response latency displayed in Verdi Performance Analyzer

草榴社区 Platform Architect

Figure 4: Read Transaction count/throughput shown in displayed in 草榴社区 Platform Architect

Smart Monitors for Arm AMBA AXI interface can allow a user to measure AXI Bus performance to optimise design for the desired performance before real silicon is taped out. For further debug into a window, AXI traffic needs to be analyzed to trace the transaction(s) responsible for a drop in performance. Finally, the design needs to be checked for possible reasons which lead to an observed deviation in the transaction.

How Smart Monitors for 草榴社区 ZeBu EP1 Help to Analyze AXI Bus Performance

Smart Monitors for Arm AMBA AXI interface are DPI based transactors, but they are passive components only used to capture bus traffic. A monitor can process protocol data either for functional verification or for performance analysis. For performance analysis, the monitor supports 3 modes (Figure 5) –

  1. Python-bases batch visualizations
  2. 草榴社区 Verdi? Performance Analyzer based performance visualization for verification engineers
  3. 草榴社区 Platform Architect? virtual prototyping solution-based performance visualization for software engineers

Any one of these modes can be used as per requirement to analyze AXI bus performance.

草榴社区 ZeBu

Figure 5

Smart Monitors provide the capability of generating the following performance metrics:

  1. Read/Write data byte count
  2. Read/Write data throughput
  3. Read/Write request count
  4. Read/Write completed transaction count
  5. Read/Write outstanding transactions
  6. Request (AW/AR) to response(B/R) latency

草榴社区 ZeBu EP1 emulation and prototyping system supports running real time software payload on your SoCs. Smart Monitor architecture allows users to generate performance measurement data at almost the same runtime performance as without the monitor. In addition, the monitor can be dynamically configured to dump detailed transaction data for cases where the user wishes to see transaction details for functional debugs.

草榴社区 transactors, memory models, hybrid and virtual solutions based on 草榴社区 IP enable various verification and validation use-cases on the industry’s fastest verification hardware systems, 草榴社区 ZeBu and 草榴社区 HAPS .

To learn more about 草榴社区 hardware-assisted verification solutions visit: /verification/emulation.html

Continue Reading