Cloud native EDA tools & pre-optimized hardware platforms
By Mahurshi Akilla, Corporate Applications Engineer, 草榴社区 and Lakshmi Gopalakrishnan, Corporate Applications Engineer, 草榴社区
Efficient datapath extraction is essential in getting good quality of results (QoR), particularly in designs with large datapath content. This article highlights the importance of RTL coding style, describes how coding style can influence the datapath extraction, and explains some of the datapath analysis techniques that designers can use to improve the QoR of their designs.
To get the maximum benefit from datapath extraction, designers should follow the coding guidelines that allow Design Compiler to extract the largest possible datapath block. A large extracted datapath contains more arithmetic operators that allow high level optimizations to occur, resulting in the most optimized design during synthesis. To find out more about the coding guidelines that provide good QoR, refer to SolvNet article, “” (SolvNet login required).
The analyze_datapath_extraction command in Design Compiler (DC) gives feedback whenever datapath extraction is blocked due to RTL coding style. This feedback is provided as HDL messages. Making changes to the coding style based on this feedback can greatly improve how efficiently DC can optimize the design and give better QoR.
In DC version I-2013.12, the analyze_datapath_extraction command was enhanced to quickly determine and prioritize the datapath leakage issues in large hierarchical designs. To find out more about these enhancements, refer to “What’s New with DesignWare Building Blocks and minPower Components in I-2013.12.”
Table 1 shows different kinds of HDL messages reported by the analyze_datapath_extraction command. Designers should first focus on the HDL messages that identify arithmetic operators on the timing critical blocks or on the blocks that consume relatively higher power, as fixing those cases provides the greatest QoR benefit. In addition, note that HDL-120 and HDL-132 messages provide good QoR benefit with relatively simple RTL changes, and these messages also tend to occur more frequently in typical designs.
|
Table 1: HDL messages reported by the analyze_datapath_extraction command
Datapath leakage happens when an internal operand is not wide enough to store the result of an operation, but the full result is required later for another operation.
In the “Bad QoR” example in Table 2, the output of the operation a* b + c should be 17 bits wide, but it is truncated to a 16-bit value in the assign statement in line 5. In line 6, there is an extension in the addition operation leading to a 17-bit value, causing datapath leakage. The analyze_datapath_extraction command identifies this case and issues an HDL-120 message. Correcting the bit width of the intermediate value ‘t’ as shown in the “Good QoR” example below resolves this issue and provides better QoR.
|
Table 2: Improving QoR by increasing bit width
For large designs, there may be multiple HDL leakage messages (HDL-120 and HDL-132) to review. To make this task easier, analyze_datapath_extraction helps sort the messages when the –sort or –max switches are used. The messages are sorted based on the priority of the potential QoR impact. This priority is determined based on the following criteria:
When RTL coding guidelines are not followed, designers can use the HDL messages provided by the analyze_datapath_extraction command to modify RTL and improve QoR.
Example: A Verilog design consisting of datapath logic that does not follow the RTL coding guidelines:
module logicblock (
input [15:0] a5, b5,
output [31:0] c5
);
assign c5 = a5 * b5;
endmodule
module top (
input clk, en,
input [15:0] a1, a2, a3, a4, a5, a6, a7,
input [15:0] b1, b2, b3, b4, b5, b6, b7,
output [32:0] z1,
output [32:0] z2,
output reg [127:0] z3);
wire [32:0] c1 = a1*b1 + b2;
wire [32:0] c2 = a2*b2 + b3;
wire [31:0] c3 = a3*b3 + b4; //truncating result
wire [32:0] c4 = a5*b5 + b6;
wire [31:0] c5;
wire [31:0] c6;
reg [63:0] c8, c9;
//part of datapath is at a different hierarchy
logicblock i_logicblock (a5, b5, c5);
//simple multiplication operation done using DW02_mult instead of * operator
DW02_mult #(16, 16) U1 ( .A(a6), .B(b6), .TC(1'b0), .PRODUCT(c6) );
//extension of truncated 32 bit result in c3 to 33 bits
assign z1 = c1 + c2 + c3 + c4 + c5 + c6 + b7;
wire [28:0] c7_t = 0 - a7; //output of operation is treated as
signed
assign z2 = c7_t + b7; //driver(signed)/load(unsigned) mismatch
at fanin
//operation broken down with manual pipelining
always @(posedge clk) begin
if (en) begin
c8 <= c1*c2;
c9 <= c3*c4;
z3 <= c8*c9;
end
end
endmodule
The analyze_datapath_extraction command is used in the compile script after the RTL elaboration stage and prior to the compile command, and helps designers understand the issues that block datapath extraction. In a large design that may contain multiple HDL messages from analyze_datapath_extraction, understanding which messages provide the best QoR improvement enables designers to address those messages first. The following section suggests the process to fix the analyze_datapath_extraction messages generated for the above design.
First, look at the timing report of the design and determine if datapath logic is present in critical paths. If HDL messages identify operators that happen to be on the timing critical paths, address them first. Pipelining is used to improve the throughput in high performance designs. However, datapath optimization may be prevented if manual pipelining is done by sub-optimally placing the pipeline registers in between datapath logic. When pipelining occurs around datapath logic, the retiming feature should be used instead of manual pipelining, as retiming tends to provide better QoR benefit as shown:
a. Look at the output of the report_qor command to see if there are any paths that violate timing.
Timing Path Group 'clk'
Levels of Logic : 31.00
Critical Path Length : 4.54
Critical Path Slack : -0.13
Critical Path Clk Period: 4.66
Total Negative Slack : -8.22
No. of Violating Paths : 81.00
b. Once the violating paths are clear, the next step is to check if any of these paths contain datapath components. Use the report_timing command to look at the top violating timing paths. Extracted datapath blocks can be identified in a timing path as they have the prefix “DP_OP_” in the cell names. It is harder to identify singleton datapath components as they get ungrouped during synthesis, so it is essential to look at the actual cells in the critical path and analyze, as shown:
Startpoint: c8_reg_41_ (rising edge-triggered flip-flop clocked by clk)
Endpoint: z3_reg_114_ (rising edge-triggered flip-flop clocked by clk)
Path Group: clk
Path Type : max
Des/Clust/Port | Wire Load Model | Library |
------------------------------------------------------------------------------------------------------------------------------------------------ | ||
Top | B0.2X0.2 | ts28nphhpmc_ss0p9vn40c |
Point | Incr | Path |
------------------------------------------------------------------------------------------------------------------------------------------------ | ||
clock clk (rise edge) | 0.00 | 0.00 |
clock network delay (ideal) | 0.00 | 0.00 |
c8_reg_41_/CK (SEM_FDPHQ_2) | 0.00 | 0.00 r |
c8_reg_41_/Q (SEM_FDPHQ_2) | 0.20 | 0.20 f |
U472/X (SEM_INV_9) | 0.08 | 0.29 r |
U7630/X (SEM_EN2_8) | 0.18 | 0.47 r |
U5794/X (SEM_INV_6) | 0.15 | 0.62 f |
U5704/X (SEM_ND2_12) | 0.08 | 0.70 r |
U136/X (SEM_INV_18) | 0.04 | 0.74 f |
U5279/X (SEM_INV_32) | 0.05 | 0.79 r |
U8180/X (SEM_OAI22_1) | 0.13 | 0.91 f |
U8260/CO (SEM_ADDF_V1_2) | 0.34 | 1.25 f |
U8432/S (SEM_ADDF_V1_2) | 0.50 | 1.76 r |
U1748/X (SEM_EN2_8) | 0.21 | 1.96 r |
U1198/X (SEM_EN2_8) | 0.22 | 2.19 r |
.. | ||
.. | ||
.. | ||
.. | ||
data arrival time | 4.54 | |
clock clk (rise edge) | 4.66 | 4.66 |
clock network delay (ideal) | 0.00 | 4.64 |
z3_reg_114_/CK (SEM_FDPRBQ_V2_2) | 0.00 | 4.66 r |
library setup time | -0.25 | 4.41 |
data required time | 4.41 | |
------------------------------------------------------------------------------------------------------------------------------------------------ | ||
data required time | 4.41 | |
data arrival time | -4.54 | |
------------------------------------------------------------------------------------------------------------------------------------------------ | ||
slack (VIOLATED) | -0.13 |
The timing path above shows full adders (SEM_ADDF_V1_2) in the timing critical path. Using RTL Cross Probing in Design Vision (Figure 1), the adder implemented is part of the datapath block. “Origin” listed as “datapath” below points to the fact that this cell (U8260) has originated as a result of datapath optimization. Similar results are seen for cell U8432 as well. (For more information on RTL Cross Probing, refer to Design Compiler User Guide).
Figure 1: Using RTL Cross Probing in Design Vision
c. The next step is to check for analyze_datapath_extraction commands that point to logic in this register-to-register path (HDL-121 messages). The above RTL code that is in the critical path points to logic that uses manual pipelining. Using the retiming feature in these paths can help to ease timing. (Refer to SolvNet Article 015771 : Coding Guidelines for Datapath Synthesis). Because HDL-121 messages are directly related to the critical path, we need to address them first.
There are two HDL-121 messages pointing to the registers that break the connection between DesignWare blocks:
Information: There is sequential cell between operator associated with
'mult_41 (top.v:41)' and 'mult_43 (top.v:43)' in design 'top'. (HDL-
121)
Information: There is sequential cell between operator associated with
'mult_42 (top.v:42)' and 'mult_43 (top.v:43)' in design 'top'. (HDL-
121)
Now that we have identified that manual pipelining is being done on logic that is in the critical path, the fix is to move the pipeline registers to the output of the datapath logic and do retiming using 'set_optimize_registers' in Design Compiler (see "Design Compiler Reference Manual: Register Retiming"). This will let the tool extract all relevant logic around the registers, choose the appropriate architectures and move the registers to the optimal locations to fix timing.
Bad QoR | Good QoR |
output reg [127:0] z3; | output reg [127:0] z3; |
Table 3: Addressing HDL-121 message in the design
Note: While addressing HDL-121 messages, it is important to make sure that
The datapath leakage messages are high priority messages that, when addressed, provide improvement to QoR. These messages should be addressed first, especially for arithmetic operations on the critical paths. Shown below is the analyze_datapath_extraction command output used with the –sort option on the example design. This lists out the leakage messages based on high to low priority.
**** | |||
Leakage detections on design 'top' | |||
**** | |||
Msg type | | Msg count | | Max lkg width | | Avg. lkg width |
----------------------------------------------------------------------------------------------------------------------------------------------------------- | |||
HDL-120 | 2 | 32 | 30 |
HDL-132 | 1 | 29 | 29 |
***Sorted leakage messages |
Information: Operator associated with resources 'add_18 (top.v:18)' in design 'top' breaks the datapath extraction because there is leakage due to truncation on its fanout to operator of resources 'add_32 (top.v:32) add_32_2 (top.v:32) add_32_3 (top.v:32) add_32_4 (top.v:32) add_32_5 (top.v:32) add_32_6 (top.v:32)'. (HDL-120)
Information: The output of subtractor associated with resources 'sub_34 (top.v:34)' is treated as signed signal. (HDL-132)
Information: Operator associated with resources 'add_35 (top.v:35)' in design 'top' breaks the datapath extraction because there is leakage due to driver(signed)/load(unsigned) sign mismatch on fanin from operator of resources 'sub_34 (top.v:34)'. (HDL-120)
Table 4 lists the code corresponding to the leakage message and the corresponding RTL fix for QoR improvement.
Bad QoR | Good QoR | |
1. | input [15:0] a3, b3, b4; | input [15:0] a1, a2, a3, a4, a5, a6, a7; |
2. | input [15:0] a7, b7; | input [15:0] a7, b7; |
Table 4: Addressing HDL-120 and HDL-132 messages in the design
Table 5 lists the QoR improvement seen due to fixing timing violations as well as the high priority datapath leakage issues in the design.
S.No | Before fixes | After fixing timing violations and leakage messages | Percentage improvement | |
1 | Critical Path Slack | -1.3 | 0 | |
2 | Critical Path Length | 4.54 | 4.19 | |
3 | Number of violating Paths | 81 | 0 | |
4 | Total Negative Slack | -8.22 | 0 | |
5 | Area | 31511.92 | 23840 | 24.349% |
Table 5: QoR improvement seen after fixing timing violations and datapath leakage based on analyze_datapath_extraction messages
*The runs were performed using a 28nm HP library using I-2013.12-SP2
Now that all the timing violations have been fixed and the area is reduced, the next step is to address the lower priority messages to improve QoR. The following section lists the lower priority messages issued in this design and the QoR benefit seen by fixing them.
Listed are the two lower priority analyze_datapath_extraction messages:
Information: Missing possible extraction across cell 'mult_5 (top.v:5)' in design 'logicblock' and cell 'add_32_4 (top.v:32)' in design 'top'. (HDL-126)
Information: Cell 'U1 (top.v:29)' in design 'top' cannot be extracted because it instantiates 'DW02_mult', which could be inferred with operator '*' instead. (HDL-125)
Bad QoR | Good QoR |
module logicblock ( | module top ( |
module top ( | module top ( |
Table 6: Addressing HDL-126 and HDL-125 messages
Table 7 shows the QoR improvement after fixing the RTL based on lower priority HDL messages from analyze_datapath_extraction.
|
Table 7: QoR improvement seen after fixing the RTL based on lower priority HDL messages from analyze_datapath_extraction
*The runs were performed using a 28nm HP library using I-2013.12-SP2
In large designs, there is a possibility that there are lots of HDL messages as reported by the analyze_datapth_extraction command. In such scenarios, it is important to focus on the HDL messages that directly relate to critical blocks in the design and address them first.
The following guidelines can be useful in simplifying the process to address the issues identified by analyze_datapath_extraction:
The other low priority messages can provide additional marginal improvement if addressed. Refer to the man page of the analyze_datpath_extraction command for more information.
The analyze_datapath_extraction command provides valuable guidance to help identify the areas where RTL coding is not optimal for efficient datapath synthesis. Designers can use the output of this command and focus on the HDL messages that directly impact the critical blocks to get the most benefit. Making small modifications to the RTL based on the HDL messages from analyze_datapath_extraction command fixes timing violations while reducing area use. Designers should consider using RTL coding guidelines and the HDL messages from analyze_datapath_extraction command to get the best possible QoR in datapath designs.