DHE_EXPORT. When a server selects DHE_EXPORT for a handshake, it proceeds by issuing a signed ServerKeyExchange
message containing a 512-bit p512, but the structure of this
message is identical to the message sent during standard
DHE ciphersuites. Critically, the signed portion of the server’s message fails to include any indication of the specific
ciphersuite that the server has chosen. Provided that a client offers DHE, an active attacker can rewrite the client’s
ClientHello to offer a corresponding DHE_EXPORT
ciphersuite accepted by the server and remove other ciphersuites
that could be chosen instead. The attacker rewrites the
ServerHello response to replace the chosen DHE_EXPORT
ciphersuite with a matching non-export ciphersuite and forwards the ServerKeyExchange message to the client as is. The
client will interpret the export-grade tuple (p512, g, gb) as valid
DHE parameters chosen by the server and proceed with the
handshake. The client and server have different handshake
transcripts at this stage, but an attacker who can compute
b in close to real time can then derive the master secret and
connection keys to complete the handshake with the client.
There are two remaining challenges in implementing this
active downgrade attack. The first is to compute individual
discrete logarithms in close to real time, and the second is
to delay handshake completion until the discrete logarithm
computation has had time to finish.
3. 3. 512-bit discrete logarithm computations
We modified CADO-NFS19 to implement the number field
sieve discrete logarithm algorithm and applied it to the top
two DHE_EXPORT primes shown in Table 1. Precomputation
took seven days for each prime, after which computing individual logarithms requires a median of 70 seconds.
Precomputation. As illustrated in Figure 1, the precom-
putation phase includes the polynomial selection, sieving,
and linear algebra steps. For this precomputation, we delib-
erately sieved more than strictly necessary. This enabled
two optimizations: first, with more relations obtained from
sieving, we eventually obtain a larger database of known
logarithms, which makes the descent faster. Second, more
sieving relations also yield a smaller linear algebra step,
which is desirable because sieving is much easier to paral-
lelize than linear algebra.
For the polynomial selection and sieving steps, we used
idle time on 2000–3000 microprocessor cores in parallel.
Polynomial selection ran for about 3hrs ( 7,600 core-hours).
Sieving ran for 15hrs ( 21,400 core-hours). This sufficed to
collect 40mn relations of which 28mn were unique, involving 15mn primes of at most 27 bits.
From this data set, we obtained a square matrix with 2.2mn
rows and columns, with 113 nonzero coefficients per row
on average. We solved the corresponding linear system on
a 36-node cluster using the block Wiedemann algorithm. 4, 20
Using unoptimized code, the computation finished in
120hrs ( 60,000 core-hours).
The experiment above was done with CADO-NFS in early
2015. As of 2017, release 2. 3 of CADO-NFS19 performs 20%
faster for sieving, and drastically faster for linear algebra,
since 9,000 core-hours suffice to solve the same linear system on the same hardware. In total, the wall-clock time for
each precomputation was slightly over one week in 2015,
and is reduced to about two days with current hardware and
more recent software.
Descent. Once this precomputation was finished, we
were able to run the final descent step to compute individual
discrete logarithms in about a minute. We implemented the
descent calculation in a mix of Python and C. On average,
computing individual logarithms took about 70sec, but the
time varied from 34sec to 206sec on a server with two 18-core
Intel Xeon E5-2699 CPUs. For purposes of comparison, a
single 512-bit RSA factorization using the CADO-NFS implementation takes about four days of wall-clock time on the
computer used for the descent. 19
3. 4. Active attack implementation
The main challenge in performing this attack is to compute
the shared secret gab before the handshake completes in
order to forge a Finished message from the server. With our
descent implementation, the computation takes an average of 70sec, but there are several ways an attacker can work
around this delay:
Non-browser clients. Different TLS clients impose different time limits, after which they kill the connection.
Command-line clients such as curl and git have long or
no timeouts, and we can hijack their connections without
TLS warning alerts. Web browsers tend to have shorter
timeouts, but we can keep their connections alive by sending TLS warning alerts, which are ignored by the browser
but reset the handshake timer. For example, this allows us
to keep Firefox TLS connections alive indefinitely.
Ephemeral key caching. Many TLS servers do not use a
fresh value b for each connection, but instead compute gb
once and reuse it for multiple negotiations. For example,
F5 BIG-IP load balancers will reuse gb by default. Microsoft
Schannel caches gb for two hours — this setting is hard-coded. For these servers, an attacker can compute the discrete logarithm of gb from one connection and use it to
attack later handshakes.
certS, sign(skS, [cr | sr | p512 | g | gb])
(ms, k1, k2) = kdf(gab, cr | sr) b = dlog(gb mod p512)
(ms, k1, k2) = kdf(gab, cr | sr)
authenc(k1, Dataf s)
MitM Server S
Figure 2. The Logjam attack. A man-in-the-middle can force TLS
clients to use export-strength Diffie-Hellman with any server that
allows DHE_EXPORT. Then, by finding the 512-bit discrete log, the
attacker can learn the session key and arbitrarily read or modify the
contents. Datafs refers to False Start application data that some TLS
clients send before receiving the server’s Finished message.