Figure 1: An example of a social network and its de-identified counterpart where (a), on the left, shows an original social network
and (b), on the right, a de-identified social network.
Link Disclosure
As mentioned previously, some relationships, or links, between social
network users may also convey sensitive information. Link disclosure
occurs when sensitive link structure information is leaked as a result of
social network data publication, or inferred by compromised social
network users.
In the first context, inferring link structure from anonymized data,
the social network owner wants to publish the social network to
untrusted recipients for analysis purposes in a way that sensitive relationships between users cannot be inferred from the published data,
for example, through the use of graph mining techniques [ 17].
Backstrom et al. [ 2] considered
two different types of attacks. The
first, called an active attack, involves creating new user accounts
and establishing relationships with
existing users. This allows a malicious data recipient, or attacker,
to identify the “fake” users that
were created and their relationships to other users in the published data. When some of these
relations are sensitive, link disclosure may occur as a result.
The second type of attack is called passive and involves an attacker
who has not tampered with the network data prior to its publication as
in the active attack, but is able to locate himself or herself in the published data, as well as sensitive relations of users that he or she is
related to. Experiments conducted using a social network data consisting of 4. 4 million users extracted from LiveJournal.com (http://
livejournal.com), a popular blogging site, demonstrated that creating
only seven new user accounts results in compromising approximately
2,400 links on average.
As an example of an active link disclosure attack, consider the network data of Figure 1(a), in which every user has knowledge of his or
her immediate neighbors only, and an attacker using Greg and Jim as
“fake” users, both of whom are related to Mary. When this data is pub-
lished, as in Figure 1(b), the attacker can identify Mary in the released
data, and disclose her potentially sensitive links to Anne and Tom.
Such an attack is realistic in networks where a user can interact with
other users without their consent, for example, via email or instant
messages. Similarly, when Brad is able to identify himself in the data of
Figure 1(b) (as he is the only user having one neighbor), he can issue a
passive attack, disclosing Anne’s links.
Now let’s consider what happens when link structure is inferred
from unpublished data. Different from the previous network data publishing scenario, there are cases in which the data owner is not willing
to reveal the links that exist among users, since these links can be an
asset used to maximize
profits (through advertising products to users,
for example). However,
it has been shown that an
adversary can still reveal
sensitive relationships by
exploiting the link structure of a number of non-anonymous nodes, which
are bribed to reveal their
links to the adversary.
Specifically, Korolova et al. [ 8] showed that the number of compromised nodes needed to reveal links decreases exponentially with
an increase of the maximum distance in which the incident nodes
remain visible to a compromised node. Using social network data
extracted from LiveJournal.com, the authors also showed that, for
realistic values of maximum distance ( 3), bribing a very small percentage of users (36 out of 572,949 users) suffices to disclose the relationships of 80 percent of the nodes to their immediate neighbors.
To exemplify this attack scenario, assume now that the data of
Figure 1(a) is extracted from LinkedIn ( http://linkedin.com), a social
network where each user has knowledge of his or her friends and their
immediate friends, and an attacker who has bribed Mary. By doing so,
the attacker is able to reveal the link structure of the entire network.
❝Ensuring that social networks operate in
a privacy-aware manner requires controlling
which users' information a certain social
network user can access. Although this can
be achieved by simple opt-in/opt-out policy,
a more careful consideration of the problem
involves flexible privacy policy specification and
enforcement techniques, or access control.❞