After validation, we are left with 628,692 leaf certificates
( 40.0% of all certificates advertising Alexa domains and 3.2%
of all certificates). We refer to this set of certificates as the
Leaf Set, each of which has a valid chain. We refer to the set
of all CA certificates on these chains (not including the leaf
certificates) as the CA Set, which contains 910 unique certificates. The Leaf Set certificates cover 166,124 ( 16.6%) of the
Alexa Top-1M domains. This is the set of certificates and
chains that we use in the remainder of the paper.
3. 3. Collecting CRLs
To determine if and when certificates were revoked, we
extracted the CRL URLs out of all Leaf Set certificates. We
found 626,659 ( 99.7%) of these certificates to include at
least one well-formed, reachable CRL URL. For certificates
that included multiple CRL URLs, we included them all. We
found a total of 1,386 unique CRL URLs (most certificates
use a unified CRL provided by the signing CA, so the small
number of CRLs is not surprising). We downloaded all of
these CRLs on May 6, 2014, and found 45,268 ( 7.2%) of the
Leaf Set certificates to be revoked.
We also collected the CRL URLs for all certificates in the
CA set. We found that 884 ( 97.1%) of the certificates in the
CA Set included a reachable CRL; the union of these URLs
comprised 246 unique reachable URLs. We downloaded
these CRLs on May 6, 2014, as well. We found a total of seven
CA certificates that were revoked, which invalidated 60 certificates in the Leaf Set (< 0.01%).
3. 4. Inferring the Heartbleed vulnerability
Finally, we wish to determine if a site was ever vulnerable to
the Heartbleed OpenSSL vulnerability (and if it continued
to be vulnerable at the end of the study). Doing so allows
us to reason about whether the site operators should have
reissued their SSL certificate(s) and revoked their old one(s).
Determining if a host is currently vulnerable to Heartbleed
is relatively easy, as one can simply send an improperly-for-matted SSL heartbeat message with a payload_length of
0 to test for vulnerability without exfiltrating any data. 6
However, determining if a site was vulnerable in the
past—but has since updated their OpenSSL code—is more
challenging. We observe that only three of the common TLS
coverage surrounding Heartbleed reduces the likelihood
that administrators failed to take action because they were
unaware of the vulnerability.
3. DATA AND METHODS
We now describe the data sets that we collected and our
methodology for determining a host’s SSL certificate, when
it was in use, if and when the certificate was revoked, and
if the host was (or is still) vulnerable to the Heartbleed bug.
3. 1. Certificate data source
We obtain our collection of SSL certificates from (roughly)
weekly scans of the entire IPv4 address space made available
by Rapid7.16 We use scans collected between October 30,
2013 and April 28, 2014. There are a total of 28 scans during
this period, giving an average of 6. 7 days (with a minimum
of 3 days and maximum of 9 days) between successive scans.
The scans found an average of 26. 9 million hosts responding to SSL handshakes on port 443 (an average of 9.12% of
the entire IPv4 address space). Across all of the scans, we
observed a total of 19,438,865 unique certificates (
including all leaf and CA certificates). In the sections below, we
describe how we filtered and validated this data set; an overview of the process is provided in Figure 1.
3. 2. Filtering data
To focus on web destinations that are commonly accessed
by users, we use the Alexa Top-1M domains1 as observed on
April 28, 2014. We first extract all leaf (non-CA) certificates
that advertise a Common Name (CN) that is in one of the
domains in the Alexa list (e.g., we would include certificates
for facebook.com, www.facebook.com, as well as *.dev.face-
book.com). This set represents 1,573,332 certificates ( 8.1%
of all certificates).
Unfortunately, despite leaf certificates having a CN in the
Alexa list, many may not be valid (e.g., expired certificates,
forged certificates, certificates signed by an unrecognized
root, etc.). We removed these invalid certificates4 by running
openssl verify on each certificate (and its corresponding chain). We configure OpenSSL to trust the root CA certificates included by default in the OS X 10. 9. 2 root store12; this
includes 222 unique root certificates.
October 30, 2013
February 5, 2014 9,640,973
February 10, 2014
April 28, 2014
CRL URLs Revoked certs
May 6, 2014
Figure 1. Workflow from raw scans of the IPv4 address space to valid certificates (and corresponding CRLs) from the Alexa Top-1M domains.
The Rapid7 data after February 5, 2014 did not include the intermediate (CA) certificates, necessitating additional steps and data to perform