operation, introducing negligible CPU
overhead in the common case. In contrast, “corrupted incremental verify”
measures the average time required
to reject a corrupted signature. Using
incremental verification Clue achieves
a 70% reduction in overhead over the
original scheme.
The only significant difference be-
tween the eight-packet times and the
single-packet times occurs when sign-
ing a packet using precomputed values
arising as a result of hashing the extra
data in the additional packets. Note,
however, that this cost is still roughly
two orders of magnitude less than any
other operation, so we do not observe
any additional impact on bulk through-
put. As a result, amortizing the attribu-
tion operations over multiple pack-
ets is a key mechanism for reducing
receive overhead. In the experiments
discussed in the following section, we
show that large windows combined
with precomputed signatures can dra-
matically improve performance over
basic Sign and Verify alone.
figure 2. tCP throughput performance for combined optimizations; y-axis is in log scale.
1,000
100
Throughput (Mbps)
10
Linux
Proxy
Precomp+Async+Win- 64
Precomp+Async+Adaptive Win
Precomp+Async+Win- 8
Sign+Verify
1
0.1
0
50
100
Link RT T (ms)
150
200
overheads of cryptographic operations for both one- and eight-packet windows.
sign
precomputed sign
precomputation
verify
incremental verify
corrupted incremental verify
1 packet (in ms)
6. 17
0.022
6. 14
15. 7
15. 6
4. 85
8 packets (in ms)
6. 19
0.058
6. 14
15. 7
16. 3
4. 83
warder at typical Internet RTTs. While
privacy-preserving attribution has a
non-negligible effect on bulk throughput on today’s client systems, the cost
is not prohibitive and will continue
decreasing over time, as CPU performance increases more quickly than
typical Internet bandwidth.
We conduct ttcp benchmarks between the sender and receiver, requiring them to forward traffic through a
delay host. For every test configuration, we run each individual transfer
for at least 20 seconds. We require the
sender to transfer all its data before
it closes the connection, timing the
transfer from when the sender connects to the receiver to when the sender receives the FIN from the receiver.
Figure 2 outlines the results of the
experiments for a number of configurations. We vary the roundtrip time
(RTT) between sender and receiver
on the x-axis and plot the throughput
achieved using the ttcp application
benchmark on the the y-axis; note the
y-axis is a log scale, each point is the
average of five runs, and error bars
show the standard deviation.
As an upper bound, the“Linux”curve
plots the forwarding rate of the default Linux networking stack on our
hardware. To provide a more realistic
baseline for our Clue implementation,
we also show the performance of an
unmodified user-level Click installation (“Proxy”); Click forwards packets received on its input to its output
without processing. The difference
between “Proxy” and “Linux” shows
the overhead of interposing in the
network stack at user level, including
copying overhead when crossing the
kernel boundary. However, an optimized, kernel-level packet-attribution
implementation need not suffer this
overhead. Though not shown, we also
measured the performance of the provided Click IPsec module, finding its
performance indistinguishable from
the “Proxy” configuration.
The “Sign+Verify” line corresponds
to the baseline performance of Clue
using individual Sign and Verify on
each IP datagram. Given the times required for Sign and Verify, as shown
in the table, one would expect the 29ms
required for the Verify operation to
limit long-term bulk throughput to a
maximum of 0.35Mbps. Not surpris-