• Sensitive data is never available in plaintext at the
• If the application requests no relational predicate
filtering on a column, nothing about the data content
leaks (other than its size in bytes). In Section 5, we show
that all or almost all of the most sensitive fields in the
tested applications are in this category.
• If the application requests equality checks on a column, CryptDB’s proxy reveals which items repeat in
that column, but not the actual values.
• If the application requests order checks on a column, the
proxy reveals the order of the elements in the column.
2. 2. threat 2: arbitrary threats
The approach in threat 1 is insufficient when the application
server, the proxy, and the DBMS server infrastructure may all
be arbitrarily compromised. The reason is that an adversary
corrupting the proxy can now get access to the master key
used to encrypt the entire database.
The solution is to use the SQL-aware and adjustable
encryption techniques, but not with a single master key.
Instead, we use per-user keys, derived from the user’s password, each having access to only a subset of the data.
A challenge is that simply encrypting each user’s data
with that user’s password does not work because users
may share data. To permit data sharing, we encrypt each
data item with a new key, and chain these new keys to user
passwords, so that each data item can be decrypted only
through a chain of keys rooted at the password of a user
with legitimate access to that data. To construct a chain of
keys that captures the application’s data privacy and sharing policy, CryptDB requires the developer to provide policy annotations over the application’s SQL schema.
Because queries still execute over encrypted data, the
passive adversary of threat 1 remains at bay. In addition,
even in the face of arbitrary server-side compromises,
CryptDB protects the data of users logged out for the duration of an attack, since none of the components compromised by this attack have access to the keys of those users.
However, an adversary that compromises the application
server or proxy can gain access to data of users logged in
during the attack, by obtaining their keys. By “duration of
an attack,” we mean the interval from the start of a compromise until any trace of the compromise has been erased
from the system.
3. QueRieS oVeR enCRyPteD Data
This section describes how CryptDB executes SQL queries
over encrypted data in the face of the threat described in
Section 2. 1.
The CryptDB proxy stores a secret master key MK, the
database schema, and the current encryption layer of each
column. The DBMS server sees an anonymized schema (in
which table and column names are replaced by opaque
identifiers), encrypted user data, and some auxiliary tables
used by CryptDB. CryptDB also equips the server with certain user-defined functions (UDFs) that enable the server to
compute on ciphertexts for certain operations.
Processing a query in CryptDB involves four steps:
1. The application issues a query, which the proxy
intercepts and rewrites: it anonymizes each table
and column name, and, using the master key MK,
encrypts each constant in the query with an encryption scheme best suited for the desired operation
(Section 3. 1). The proxy also replaces certain operations with UDFs.
2. The proxy checks if the DBMS server should be given
keys to adjust encryption layers before executing the
query, and if so, issues an UPDATE query at the DBMS
server, which invokes a UDF to adjust the encryption
layer of the appropriate columns (Section 3. 2).
3. The proxy sends the encrypted query to the server,
which executes it.
4. The server returns the encrypted query result, which
the proxy decrypts and returns to the application.
3. 1. SQL-aware encryption
We now describe the encryption methods used in CryptDB,
including a number of existing cryptosystems and a new cryptographic primitive for joins. For each encryption method,
we explain the security property that CryptDB requires from
it, its functionality, and how it is implemented.
random (rND). RND provides the maximum security in
CryptDB: indistinguishability under an adaptive chosen-plaintext attack (IND-CPA). This scheme is probabilistic,
meaning that two equal values are mapped to different ciphertexts with high probability. On the other hand, RND does not
allow any computation to be performed efficiently on the
ciphertext. An efficient construction of RND is to use a block
cipher like advanced encryption standard (AES) 6 or Blowfish
in CBC mode together with a random initialization vector (IV).
(We mostly use AES, except for integer values, where we use
Blowfish for its 64-bit block size because the 128-bit block size
of AES would cause the ciphertext to be significantly longer.)
Deterministic (De T). DET enables the server to learn which
encrypted values correspond to the same data value, by
deterministically generating the same ciphertext for the
same plaintext. Therefore, this encryption layer allows
the server to perform equality checks, which means it can
perform selects with equality predicates, equality joins,
GROUPBY, COUNT,DISTINCT, etc.
In cryptographic terms, DET should be a pseudo-random
permutation (PRP). 9 We use Blowfish or AES in CMC mode10
to implement DET.
Order-preserving encryption (OPe). OPE allows the server
to determine order relations between data items based on
their encrypted values, without revealing the data itself. If
x < y, then OPEK (x) < OPEK ( y), for any secret key K. Therefore,
if a column is encrypted with OPE, the server can perform
range queries when given encrypted constants OPEK (c1) and
OPEK(c2) corresponding to the range [c1, c2]. The server can
also perform ORDER BY, MIN, MAX,SORT, etc.
OPE is a weaker encryption scheme than DET because
it reveals order. Thus, the CryptDB proxy will only reveal