JANUARY 2019 | VOL. 62 | NO. 1 | COMMUNICATIONS OF THE ACM 111
of 80. 7 If we optimistically assume that a similar reduction
can be achieved for discrete logarithm, the hardware cost to
perform the linear algebra for DH-1024 in one year is plausibly on the order of $5mn.
Combining these estimates, special-purpose hardware
that can perform the precomputation for one 1024-bit group
per year would cost roughly $13mn. This is much less than
the “hundreds of millions of dollars” that we conservatively
estimated in 2015, making it even more likely that nation-state adversaries have implemented the attack.
To put this dollar figure in context, the FY 2012 budget for
the U.S. Consolidated Cryptologic Program (which includes NSA)
was $10.5bn. 22 The 2013 budget request, which prioritized investment in “groundbreaking cryptanalytic capabilities to defeat
adversarial cryptography and exploit internet traffic” included
notable $100mn+ increases in two programs under Cryptanalysis
& Exploitation Services: “Cryptanalytic IT Systems” (to $247mn),
and the cryptically named “PEO Program C” (to $360mn). 22
4. 2. Is NSA breaking 1024-bit Diffie-Hellman?
Our calculations suggest that it is plausibly within NSA’s
resources to have performed number field sieve precomputations for a small number of 1024-bit Diffie-Hellman
groups. This would allow them to break any key exchanges
made with those groups in close to real time. If true, this
would answer one of the major cryptographic questions
raised by the Edward Snowden leaks: How is NSA defeating
the encryption for widely used VPN protocols?
Virtual private networks are widely used for tunneling
business or personal traffic across potentially hostile networks. We focus on the IPsec VPN protocol using the IKE
protocol for key establishment and parameter negotiation
and the Encapsulating Security Payload (ESP) protocol for
protecting packet contents.
IKE. There are two versions, IKEv1 and IKEv2, which differ in message structure but are conceptually similar. For
the sake of brevity, we will use IKEv1 terminology. 10
Each IKE session begins with a Phase 1 handshake in
which the client and server select a Diffie-Hellman group
from a small set of standardized parameters and perform a
key exchange to establish a shared secret. The shared secret
is combined with other cleartext values transmitted by each
side, such as nonces and cookies, to derive a value called
SKEYID. In IKEv1, SKEYID also incorporates a Pre-Shared
Key (PSK) used for authentication.
The resulting SKEYID is used to encrypt and authenticate
a Phase 2 handshake. Phase 2 establishes the parameters
and key material, KEYMAT, for protecting the subsequently
tunneled traffic. Ultimately, KEYMAT is derived from SKEYID,
additional nonces, and the result of an optional Phase 2
NSA’s VPN exploitation process. Documents published
by Der Spiegel describe NSA’s ability to decrypt VPN traffic
using passive eavesdropping and without message injection
or man-in-the-middle attacks on IPsec or IKE. Figure 3 illustrates the flow of information required to decrypt the tunneled traffic.
When the IKE/ESP messages of a VPN of interest are
collected, the IKE messages and a small amount of ESP
For 1024-bit precomputation, the total time complex-
ity can be expected to increase by a factor of 1220 using the
complexity formula, while space complexity increases by its
square root, approximately 35. These ratios are relevant for
both factorization and discrete logarithm since they have the
same asymptotic behavior. For DH-1024, we get a total cost
estimate for the precomputation of about 6mn core-years. In
practice, it is not uncommon for estimates based merely on
the complexity formula to be off by a factor of 10. Estimates
of Table 2 must therefore be considered with due caution.
For 1024-bit descent, we experimented with our early-abort implementation to inform our estimates for descent
initialization, which should dominate the individual discrete logarithm computation. For a random target in Oakley
Group 2, initialization took 22 core-days, and yielded a few
primes of at most 130 bits to be descended further. In twice
this time, we reached primes of about 110 bits. At this point,
we were certain to have bootstrapped the descent and could
continue down to the smoothness bound in a few more core-days if proper sieving software were available. Thus we estimate that a 1024-bit descent would take about 30 core-days,
once again easily parallelizable.
Costs in hardware. Although several million core-years is
a massive computational effort, it is not necessarily out of
reach for a nation-state. At this scale, significant cost savings
could be realized by developing application-specific hardware
given that sieving is a natural target for hardware implementation. To our knowledge, the best prior description of an
Application-Specific Integrated Circuit (ASIC) implementation of 1024-bit sieving is the 2007 work of Geiselmann
and Steinwandt. 8 Updating their estimates for modern
techniques and adjusting parameters for discrete logarithm
allows us to extrapolate the financial and time costs.
We increase their chip count by a factor of ten to sieve more
and save on linear algebra as above, giving an estimate of 3mn
chips to complete sieving in one year. Shrinking the dies from
the 130 nanometer technology node used in the paper to a
more modern size reduces costs as transistors are cheaper at
newer technologies. With standard transistor costs and utilization, it would cost about $2 per chip to manufacture after
fixed design and tape-out costs of roughly $2mn. 14 This suggests that an $8mn investment would buy enough ASICs to
complete the DH-1024 sieving precomputation in one year.
Since a step of descent uses sieving, the same hardware could
likely be reused to speed calculations of individual logarithms.
Estimating the financial cost for the linear algebra is
more difficult since there has been little work on designing
chips that are suitable for the larger fields involved in discrete logarithm. To derive a rough estimate, we can begin
with general purpose hardware and the core-year estimate
from Table 2. Using the 300,000 CPU core Titan supercom-puter it would take four years to complete the 1024-bit linear algebra stage (notwithstanding the fact that estimates
from Table 2 are known to be extremely coarse, and could be
optimistic by a factor of maybe 10). Titan was constructed
in 2012 for $94mn, suggesting a cost of under $400mn in
supercomputers to finish this step in a year. In the context of
factorization, moving linear algebra from general purpose
CPUs to ASICs has been estimated to reduce costs by a factor