I
M
A
G
E
B
Y
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
/
S
H
U
T
T
E
R
S
T
O
C
K
There is a long history of proposed
privacy definitions, new vulnerabilities
discovered, and amended privacy definitions developed only to be broken
once again. As privacy concerns spread,
parallel copies of this process are
spawned in many research areas. Fortunately, current research has identified
many best practices for engineering
robust privacy protections for sensitive
data. Although they can be formalized
in a mathematically rigorous way, we
present them at a more intuitive level,
leveraging the following privacy definitions as sources of examples.
Definition 1 (∈-differential pri-
vacy9, 11). An algorithm M satisfies
ε-differential privacy if for each of its
possible outputs ω and for every pair
Unfortunately, privacy definitions are
not one-size-fits-all. Each application
could have its own unique privacy re-
quirements. Working independently,
researchers from disparate fields redis-
cover similar privacy technologies, along
with their weaknesses, new fixes, and
other vulnerabilities. Our goal here is
to synthesize some of the latest find-
ings in the science of data privacy in
order to explain considerations and
best practices important for the design
of robust privacy definitions for new
applications. We begin by describing
best practices, then explain how they
lead to a generic template for privacy
definitions, explore various semantic
privacy guarantees achievable with
this template, and end with an exam-
ple of a recent privacy definition based
on the template and apply it to privacy-
preserving k-means clustering.
Desiderata of Privacy Definitions
When data is collected, the curator,
with the aid of a privacy definition,
puts it in a form that is safe to release.
A privacy definition is a specification
for the behavior of randomized and
deterministic algorithms. Algorithms
that satisfy the spec are called privacy
mechanisms. The curator first chooses a privacy definition, then a privacy
mechanism M satisfying the definition. The curator will run M on the
sensitive data, then grant external users access to the output of M, or the
“sanitized output.”