Cloud native EDA tools & pre-optimized hardware platforms
By Matthew Myers, USB Hardware Engineer, 草榴社区
Over the years, certain communication protocols have included a special class of traffic, called isochronous, that provides Quality of Service (QoS). Unlike file transfers in computer systems, isochronous data transfers do not need guaranteed delivery, but they do need guaranteed service opportunities. Typical examples of isochronous traffic are audio and video streams where the synchronization of the two must be strictly maintained and the latency must be kept to a minimum, but a slight glitch in either one does not result in devastating data corruption.
Since its introduction in 1996, the USB specification has defined the isochronous transfer type as providing QoS by having "guaranteed bandwidth" with "bounded latency." Because the host performs all of the transfer scheduling in USB, most of the responsibility for delivering on these promises depends on the host in ways that span software, hardware, and the USB protocol.
However, each time the USB protocol is enhanced with a new speed but keeps backward compatibility, modifications are often necessary in hosts and hubs. For example, when the USB 2.0 specification added "High Speed" to existing "Low/Full Speed" devices, the specification also changed the way isochronous transfers were scheduled to maintain the guaranteed bandwidth and latency. USB 2.0 hubs needed a "Transaction Translator" to bridge the gap between a High Speed upstream port and a Low/Full Speed downstream port. Hosts were forced to split their requests to Low/Full Speed devices behind these hubs in order to not affect the bandwidth and latency requirements of isochronous endpoints on High Speed devices.
The same is true with 10G USB 3.1, which offers the first USB specification where two separate speeds of devices will coexist in the same topology on the USB 3.x wires. As shown in Figure 1, significant changes are needed to continue the promise of guaranteed bandwidth and bounded latency, including the following:
In this article, we explain how each piece of the USB 3.1 topology needs to be modified to support isochronous traffic in a mixed speed environment.
Figure 1: Scope of USB 3.1 isochronous changes in the topology
The article "Achieving 10 Gbps Data Rates in USB 3.1 Using Multiple INs and Hub Payload Buffering" described the reason for two of the major enhancements in the USB 3.1 protocol. With the addition of multiple IN transactions, multiple OUT transactions (which are already allowed in USB 3.0), and hub payload buffering to deal with rate matching, USB 3.1 hubs will find themselves in situations where they have to choose between different packets to transmit on a port (upstream or downstream).
In Figure 2, the hub has an isochronous data packet ready to transmit upstream towards the host as well as a bulk packet. Without proper rules in the hub about which packets have higher priority, it is possible that the bulk packet would end up blocking and delaying the isochronous packet, interfering with the bounded latency guarantee.
Figure 2: Host starts multiple INs; Hub needs to choose packet to transmit upstream
In USB 3.1, if a hub port has multiple packets buffered up for transmission, it is required to service the packets using the following priority order to choose which one to transmit first:
With the arbitration rules, the hub picks the isochronous packet first, resulting in one of the sources of reordering that can now occur in USB 3.1.
To support the distinction between priority rules #2 and #3 above, packet headers (TPs and DPs) in USB 3.1 now include the transfer type in a previously reserved region of the packet format. This transfer type is set to control, bulk, isochronous, or interrupt, and it is produced by 3.1 hosts and devices for use by the hub to for prioritization.
For these priority rules, there is one backward compatibility problem when a 3.0 device is connected to a 3.1 hub. For an IN transaction, the 3.0 device will be using the old packet format which has no transfer type when it transmits its DP. The hub would be unable to determine the priority of that packet versus packets from 3.1 devices, as shown in Figure 3.
USB 3.1 hubs have a new requirement to store the transfer type generated in the ACK TP from the host in a transfer type table so that they can modify the DP from the device and insert the correct transfer type. In this way, the arbitration rules work with existing 3.0 devices.
Figure 3: Transfer type tables in USB 3.1 hubs support USB 3.0 devices
Interrupt/isochronous DPs have a higher absolute priority than bulk/control DPs, so this implies that hubs must have separate buffering for these two classes of packets. In fact, a hub needs separate buffering for each of the two classes per port in the following amounts (each packet size is assumed to be 1024 bytes):
Table 1: Packet buffer requirements for USB 3.1 hubs
This is significantly more buffer space than USB 3.0 hubs which have an elasticity buffer that may be capable of storing 1 to 3 packets. Because TPs have a higher priority than interrupt/isochronous DPs, this buffer needs to be randomly accessible instead of a FIFO.
The link layer of USB 3.0 defined the terminology of “link credits.” The link layer protocol ensures the delivery of packet headers from the transmitter to the receiver using the credits as a backpressure mechanism to report whether the receiver had enough buffer space to accept another packet header. The receiver must be able to buffer at least 4 packet headers.
In USB 3.1, even though the hub now has separate buffers for isochronous packets to allow them to bypass bulk packets, there is still another obstacle to providing guaranteed bandwidth and latency. To demonstrate the problem, imagine a scenario where the upstream port of a hub is transmitting bulk packets toward the host. If the host stops returning link credits for a period of time, the port will be unable to transmit any more packets until the host releases a credit. Now if a device on another port transmits an isochronous packet upstream (Figure 4), it cannot continue to the host even though the arbitration rules say that the port must choose the isochronous packet over the bulk packet.
Figure 4: Isoc packet is blocked for upstream transmission due to lack of credits
To solve this piece of the puzzle, the USB 3.1 specification adds another link credit type, which means the link layer needs buffering for 4 more header packets. Traffic is separated into Type 1 and Type 2:
As shown in Figure 5, asynchronous traffic that consumes all four of the Type 2 credits on a link cannot block the transmission of isochronous traffic which uses separate Type 1 credits.
Figure 5: Isoc packet can be transmitted upstream because of separate link credits
To explain the need for this final new isochronous feature in USB 3.1, take the example of an isochronous IN endpoint on a USB 3.0 device that can return 4 packets per microframe. When the entire topology is running at USB 3.0 speeds and hubs are not buffering data, it is relatively efficient for the host to request the 4 packets, 2 at a time. The only delay is the time through the hub, as shown in Figure 6.
Figure 6: Isochronous delay through hub in USB 3.0
Given that hosts will issue multiple IN transactions and hubs will buffer payload data, there is a performance problem with isochronous transactions that interferes with the "bounded latency" guarantee when more than one device is connected behind a USB 3.1 hub. Figure 7 depicts the system inefficiencies of plugging in a USB 3.0 device into a USB 3.1 hub. The host performs multiple IN transactions to an isochronous endpoint on Device 0 and a bulk endpoint on Device 1.
Figure 7: Isochronous delay through hub in USB 3.1 (without Pipelined Isoch IN)
In Figure 7, the long, inefficient delays on the host’s and device’s links occur for two reasons:
With more devices in the topology and a larger hub depth, these inefficiencies add up. Each hub can add up to 400ns of delay in each direction. In a five tier system, the propagation of the ACK could incur 2us of downstream delay, and the propagation of the DPs could incur 2us of upstream delay. In addition, it is possible that each level of hub is already transmitting a packet upstream which would cause an upstream delay of up to 5us (a 1K packet takes about 1us to transmit on the 10G link). All told, a device could see as much as 8us of delay between ACKs which interferes dramatically with the bounded latency guarantee as seen in Figure 8.
Figure 8: Delay between ACKs interferes with bounded latency guarantee
Therefore, USB 3.1 introduces the "Pipelined Isochronous IN" feature to remedy this problem. It means that the host can send another ACK to the isochronous endpoint, requesting more data preemptively, before the device has returned all the packets from the previous request (Figure 9). The delay between packets at the host is reduced, as well as the delay between transmissions by the device.
Figure 9: Isochronous delay through hub in USB 3.1 (with Pipelined Isoch IN)
There are some obvious restrictions on this behavior:
The USB 3.0 specification required hosts to either perform a single burst or split isochronous transfers into smaller bursts of 2, 4, or 8 DPs followed by a final burst with the remaining DPs for that service interval. The classic example from the spec is the list of possibilities of how a host can burst 11 packets to or from a device:
This restriction was intended to aid with host and devices in scheduling isochronous transfers to reduce the set of possibilities into bursting powers of two. However, this artificial restriction did not prove to be very useful, so it was removed in USB 3.1. Now, hosts can decide what bursting pattern is most efficient based on the overall isochronous schedule and the topology of the bus.
In USB 3.0, responsibility for isochronous bandwidth and latency guarantees was primarily left to the host controller because the bus acted like a single lane of traffic with each transaction completing sequentially. The addition of multiple INs and hub buffering in USB 3.1 requires the creation of a virtually separate bus for isochronous traffic that extends from the host, through the hubs, to the device. This virtually separate bus is supported by pipelined isochronous IN transactions, hub buffering and arbitration rules, and separate link credits. This cascading set of requirements satisfies the original promise of "guaranteed bandwidth" with "bounded latency” that was made 18 years ago.