Bufferbloat is a well-known phenomenon,
7 where the deepest buffer
on a network path between two hosts
is eventually filled by TCP. Ostensibly,
system designers increase buffer size
to reduce loss, but deeper buffers increase the actual time taken for packets to traverse a path, increasing the
RTT and delaying the time it takes for
TCP to determine when a loss event
has occurred. Loss is the driver for
TCP’s congestion-control algorithm,
so increasing buffer size is actually
counterintuitive.
To demonstrate bufferbloat in this
experiment, tc queue sizes were systematically increased from 10kB, to
100kB, then 200kB, then finally 300kB
on the forwarding host, and netcat
was used to create a high-bandwidth
flow between each of the end hosts
prior to starting the client/server application. The intention of the high-bandwidth flow was to fill the longer
queues on the forwarding host, demonstrating that the draining time affects application responsiveness.
The results of the experiment are
shown in figures 4 and 5. Figure 4 shows
the dispersion of RTT measurements
as the buffer sizes were increased. Focusing on the 300kB test in Figure 5,
we see very similar RTT measures are
evident from both hosts in the ICMP
measurements, at the TCP layer, and
in the application layer; mean and median values for all layers in these experiments were all within 2ms of each
other. All RTT measures are inflated by
the same amount because the excessive buffer size effectively increases the
network-layer path length. Given that
the test application only emits a handful of packets once per second, the saw-tooth pattern is indicative of the netcat
data filling a queue then TCP waiting
for the queue to drain prior to sending
more of netcat’s data, forming a bursty
pattern. These filled queues adversely
affects the delivery of all other traffic
and our test application suffers RTTs,
which vary from 100ms to about 250ms
as a result.
The bufferbloat problem is be-
ing actively worked on. Mechanisms
such as Selective Acknowledgments
(SACK), Duplicate SACK (DSACK),
and Explicit Congestion Notification
(ECN), when enabled, all help allevi-
ate bufferbloat. Additionally, active
queue management strategies such as
Codel have been accepted into main-
line Linux kernels.
In summary, it is clear that to minimize delays caused by head-of-line
blocking in TCP, packet loss must
be kept to a minimum. Given that we
must expect packet loss as a primary
driver of TCP’s congestion control
algorithm, we must also be careful
to minimize network buffering, and
avoid the delays incurred by bufferbloat. The latter requirement in particular is useful to keep in mind when
provisioning networks for time-critical
data that must be delivered reliably.
Related Work
The key issue when using TCP for
time-sensitive applications is that
TCP offers a reliable bytestream. This
requirement is distinct from other
key aspects of TCP, such as conges-
tion control and flow control. TCP is
not suitable for all applications, how-
ever. Eli Brosh et al. discuss in more
detail the behavior of TCP in the pres-
ence of delay and certain acceptability
bounds for application performance.
1
UDP9 is the most commonly used
transport protocol after TCP; it’s a
datagram-oriented protocol with
no congestion control, flow control,
or message-ordering mechanisms.
It effectively augments the IP layer
with UDP-layer port numbers. Without the message-ordering constraint,
it is not affected by the head-of-line
blocking problem that can affect
TCP connections.
UDP alone is not suitable for many
applications, however, because reli-
HIGH-FREQUENCY
TRADING
figure 5. indication of measured Rtt values in the presence of excessive buffering.
300
250
200
150
100
50
0
0 20 40 60 80 100 120 140 160 180
tCP rtt measured from host b
ICmP rtt measured from host b
Rt
t
(ms
)
measured delay over time from host B with 300kB buffer on-oath.
300
250
200
150
100
50
0
0 20 40 60 80 100 120 140 160 180
application layer rtt measured from host a
tCP rtt measured from host a
ICmP rtt measured from host a
Rt
t
(ms
)
measured delay over time from host A with 300kB buffer on-oath.
seconds
seconds