In this context, our work seeks to better understand the
traceability of Bitcoin flows. Importantly, our goal is not to
generally de-anonymize all Bitcoin users—as the abstract
protocol design itself dictates that this should be impossible—
but rather to identify certain idioms of use present in concrete
Bitcoin network implementations that erode the anonymity
of the users who engage in them. We stress that our work
was done at a specific point in the evolution of Bitcoin, and
that as idioms of use change, the techniques we develop may
need to adapt as well.
Our approach is based on the availability of the Bitcoin
block chain: a replicated graph data structure that encodes
all Bitcoin activity, past and present, in terms of the public
digital signing keys party to each transaction. However, since
each of these keys carries no explicit information about ownership, our analysis depends on imposing additional structure on the transaction graph.
Our methodology has two phases. First, in Section 3, we
describe a re-identification attack wherein we open accounts
and make purchases from a broad range of known Bitcoin
merchants and service providers. Since one endpoint of
the transaction is known (i.e., we know which public key
we used), we are able to positively label the public key on
the other end as belonging to the service; we augment this
attack by crawling Bitcoin forums for “self-labeled” public
keys (e.g., where an individual or organization explicitly
advertizes a key as their own). Next, in Section 4, we build
on past efforts1, 9, 10, 12 to cluster public keys based on evidence of shared spending authority. This clustering allows
us to amplify the results of our re-identification attack: if
we labeled one public key as belonging to a particular service, we can now transitively taint the entire cluster containing this public key as belonging to that service as well.
The result is a condensed graph, in which nodes represent
entire users and services rather than individual public keys.
From this data, we examine the suitability of Bitcoin for
hiding large-scale illicit transactions. Using the dissolution
of a large Silk Road wallet and notable Bitcoin thefts as case
studies, we argue that an agency with subpoena power would
be well placed to identify who is paying money to whom.
Indeed, we argue that the increasing dominance of a small
number of Bitcoin institutions (most notably services that
perform currency exchange), coupled with the public nature
of transactions and our ability to label monetary flows to
major institutions, ultimately makes Bitcoin unattractive for
high-volume illicit use such as money laundering.
2. BITCOIN BACKGROUND
The heuristics that we use to cluster pseudonyms depend on
the structure of the Bitcoin protocol, so we first describe it
here, and briefly mention the anonymity that it is intended
to provide. Additionally, much of our analysis discusses the
“major players” and different categories of Bitcoin-based
services, so we also present a more high-level overview of
Bitcoin participation.
2. 1. Bitcoin protocol description
Bitcoin is a decentralized electronic currency, introduced
by (the pseudonymous) Satoshi Nakamoto in 20087 and
deployed on January 3, 2009. Briefly, a bitcoin can be thought
of as a chain of transactions from one owner to the next,
where owners are identified by a public key—from here on
out, an address—that serves as a pseudonym; that is, users
can use any number of addresses and their activity using one
set of addresses is not inherently tied to their activity using
another set, or to their real-world identity. In each transac-
tion, the previous owner signs—using the secret signing key
corresponding to his address—a hash of the transaction in
which he received the bitcoins and the address of the next
owner. (In fact, transactions can have many input and out-
put addresses, a fact that we exploit in our clustering heuris-
tics in Section 4, but for simplicity we restrict ourselves here
to the case of a single input and output.) This signature (i.e.,
transaction) can then be added to the set of transactions
that constitutes the bitcoin; because each of these transac-
tions references the previous transaction (i.e., in sending
bitcoins, the current owner must specify where they came
from), the transactions form a chain. To verify the validity of
a bitcoin, a user can check the validity of each of the signa-
tures in this chain.
To prevent double spending, it is necessary for each
user in the system to be aware of all such transactions.
Double spending can then be identified when a user
attempts to transfer a bitcoin after he has already done so.
To determine which transaction came first, transactions are
grouped into blocks, which serve to timestamp the transactions they contain and vouch for their validity. Blocks are
themselves formed into a chain, with each block referencing the previous one (and thus further reinforcing the
validity of all previous transactions). This process yields a
block chain, which is then publicly available to every user
within the system.
This process describes how to transfer bitcoins and
broadcast transactions to all users of the system. Because
Bitcoin is decentralized and there is thus no central authority minting bitcoins, we must also consider how bitcoins
are generated in the first place. In fact, this happens in the
process of forming a block: each accepted block (i.e., each
block incorporated into the block chain) is required to be
such that, when all the data inside the block is hashed,
the hash begins with a certain number of zeroes. To allow
users to find this particular collection of data, blocks contain, in addition to a list of transactions, a nonce. (We simplify the description slightly to ease presentation.) Once
someone finds a nonce that allows the block to have the
correctly formatted hash, the block is then broadcast in
the same peer-to-peer manner as transactions. The system
is designed to generate only 21 million bitcoins in total.
Finding a block currently comes with an attached reward
of 25 BTC; this rate was 50 BTC until November 28, 2012
(block height 210,000), and is expected to halve again in
2016, and eventually drop to 0 in 2140.
The dissemination of information within the Bitcoin network is summarized in Figure 1.
2. 2. Participants in the Bitcoin network
In practice, the way in which Bitcoin can be used is much
simpler than the above description might indicate. First,