草榴社区

Understanding Bandwidth: Back to Basics

Richard Solomon

Apr 21, 2016 / 3 min read

One of the questions I’ve been getting a lot recently is along the lines of “How many lanes and of what ‘generation’ of PCI Express do I need?”

This is a fairly straightforward question, and while coming up with a good first-order estimation is also fairly straightforward, it’s not necessarily obvious in the PCI Express specification.  Let’s start with the “raw” data rate, which is fairly easy:

 

PCI Express Data Rates

“骋别苍1”

2.5 Gb/s

“骋别苍2”

5 Gb/s

“骋别苍3”

8 Gb/s

“骋别苍4”

16 Gb/s

 

Folks who are new to PCIe may be scratching their heads right about now and thinking “Richard said before that each generation of PCIe has doubled the bandwidth…. so what happened between Gen2 and Gen3??!?!”  That leads us to the second piece of the puzzle – the encoding scheme.  The original PCI Express specification used “8b10b” encoding – which means every 8 bits of data was expanded to 10 bits when sent on the wire.  I won’t go into the details here of why this was done, but it was a common technique for limiting “runs” of 0s and 1s in the data stream.  When the 5Gb/s “骋别苍2” data rate was developed, it kept the same encoding scheme.  However, when “骋别苍3” was being developed, it was hoped that by limiting the actual signaling rate to something below 10Gb/s simpler receivers could be defined (this ultimately didn’t happen, but that’s a story for another Flashback I suppose).  To do that, and still keep a “doubling”, the encoding scheme for “骋别苍3” was changed to 128/130 – meaning every 128 bits of data get expanded only to 130 bits (instead of to 160 as 8b10b would have). 

So 5 Gigabits multiplied by 8/10 gives 4 Gigabits/second of effective data transfer, while 8 Gigabits multiplied by 128/130 gives 7.88 Gigabits/second which is close enough to double ?.

“Ok Richard, I’ve got it – so I take the data rate, multiply by the encoding factor and I’ve got my real per-lane data rate, right?”

 

PCI Express Data Rates

Encoding Factor

“骋别苍1”

2.5 GT/s

(8/10)

“骋别苍2”

5 GT/s

(8/10)

“骋别苍3”

8 GT/s

(128/130)

“骋别苍4”

16 GT/s

(128/130)

 

Packet Efficiency

That’s the first step, yes, but I’m afraid there’s one more piece of the puzzle – the packet efficiency.  This is just a reflection of the fact that there is overhead to every packet sent on PCI Express.  Firstly, every data packet includes a header which is either 3 or 4 DWORDs (32-bit or 4-byte chunks), so we add 12 or 16 bytes of overhead for that.  Every data packet also includes a 1 DWORD LCRC, so add 4 more bytes for that.  Then there is a sequence number and some start/stop information – for simplicity we’ll pretend that’s always another 4 bytes total.  (While true for “骋别苍1” and “骋别苍2” the 128/130 encoding scheme makes this not exactly accurate for “骋别苍3” and “骋别苍4”, but it will do for our purposes at the moment.)  Lastly, there is an optional End-to-End CRC called the ECRC which can be included in packets as well, at a cost of another 4 bytes. 

Since ECRC isn’t commonly used, let’s just look at 3 DWORD and 4 DWORD header packets and add those 20 or 24 bytes of overhead to our PCI Express packet sizes.  So for 128 byte packets, we actually have to send 128+20=148 or 128+24=152 bytes, which means our packet efficiency is 128/148=0.865 or 128/152=0.842.  Doing that math for the rest of the packet sizes and expressing efficiency as a percentage gives:

 

 Header Size

Efficiency (%) for Various Packet Sizes (Bytes)

128

256

512

1024

2048

4096

3-DWORD

86.5%

92.8%

96.2%

98.1%

99.0%

99.5%

4-DWORD

84.2%

91.4%

95.5%

97.7%

98.8%

99.4%

 

So *NOW* you’ve got the calculation down!  Take the “raw” data rate, multiply by the encoding factor, then by the packet efficiency to get the effective data rate per lane.  Of course if you’re using a multi-lane implementation, you get to multiply that by the number of lanes. 

Header Packets

I should also mention that generally the use of 3-DWORD vs 4-DWORD headers is tied to whether your system is addressing 32-bit or 64-bit memory.  So a small client system with less than 4GB of main memory might well use 100% 3-DWORD header packets, while a huge server running I/O Virtualization might come close to 100% 4-DWORD header packets.  You could just be pessimistic and assume 100% 4-DWORD headers or you could make your own assumptions. (Averaging the 3-DWORD and 4-DWORD efficiencies isn’t uncommon – which is probably where the “85%” number commonly batted around as the “PCIe efficiency” comes from: 128-byte packets with an even mix of 3-DWORD and 4-DWORD headers.) 

So for an 8 lane (aka “x8”)  “骋别苍3” implementation running 256-byte packets and using the more pessimistic 4-DWORD efficiency, we get: 8Gb/s * (128/130) * (0.914) * 8 = 57.6Gb/s or 7.2GB/s.

Clear as mud?

Probably needless to say, but I’ll say it anyway – if this estimation is very close to your actual bandwidth needs and if it’s critical you never fall short, then do a more detailed analysis!  In real-world systems we’ve used logic analyzers on actual hardware and measured the 草榴社区 controller IP hitting better than 98% of these numbers but there are obviously many factors which can come into play.  Contact your friendly local 草榴社区 Application Engineer or drop a note to me if you need help digging deeper into your own PCI Express application.

Continue Reading