PRIVACY CHALLENGES AND SOLUTIONS
IN THE SOCIAL WEB
By Grigorios Loukides and Aris Gkoulalas-Divanis
Research related to online social networks has addressed a number of important problems related to the storage, retrieval, and management of social network data. However, privacy concerns stem- ming from the use of social networks, or the dissemination of social network data, have largely
been ignored. And with more than 250 million active Facebook ( http://facebook.com) users, nearly half
of whom log in at least once per day [ 5], these concerns can’t remain unaddressed for long.
Broadly speaking, there are two types of concerns: 1) data access-related issues, where users are allowed access to other users’ data while
using the social network, and 2) data publication-related issues, or
how social network data can be shared with recipients in a privacy-preserving way.
The first type of privacy concerns are raised because social networks contain rich information that users are only willing to share with
trusted parties, such as users’ real identities, their participation in certain activities or groups, and their multimedia content. Since there are
cases in which users lack control of how this information is shared with
other parties, privacy may be compromised [ 3]. For example, conversations in blogs may be recorded and published on the comment-track-ing Web site coComment ( http://cocomment.com) [ 4] without explicit
user notification. Such publication may create privacy concerns, as in
a case where a user found messages posted to Citibank published on
coComment which questioned the security of Citibank’s Web site [ 12].
Furthermore, users may not realize which of their activities are visible to other users of the social network. For instance, Facebook has
different privacy policies for “photo album” and “profile photos,” with
the former open to more users than the latter.
Ensuring that social networks operate in a privacy-aware manner
requires controlling which users’ information a certain social network
user can access. Although this can be achieved by simple opt-in/opt-out policy, such as Facebook’s “edit album privacy” feature, a more
specification and enforcement techniques, or access control. Since
such problems are closely related to the area of data security, we do
not discuss them further here and refer the reader to Sloman and
Lupu [ 13], and Squicciarini, Shehab, and Paci  for details.
Privacy concerns also arise when the owners of social networks
disseminate parts of their network data to untrusted recipients, usually marketers, for analyzing or mining. Users may be both emotionally and economically harmed when such analysis reveals their
sensitive information. Assume, for example, that a social network
owner wants to share data with a marketing researcher aiming to statistically analyze this data in order to identify potential customers
based on users’ common habits. Releasing the data intact is obviously
undesirable because it potentially contains the user’s identifying information—personal name, contact information, and phone number—
that should not be disclosed.
In addition, users often do not feel comfortable revealing specific
activities or affiliations, or sensitive data, such as brand preferences or
Web browsing history. This necessitates preventing the inference of
such information in released social network data.
Consequently, it is important to identify social network data publication scenarios that may lead to privacy breaches, and develop techniques to prevent them. In what follows, we briefly survey some recent
work focusing on attacks related to data dissemination and methodologies to guard against them.
Privacy Attacks on Published Social Network Data
Before discussing specific attacks, it’s important to mention that a
social network can be modeled as an undirected graph, where each
node represents a distinct social network user having a certain ID or
name, and an edge connecting two nodes corresponds to a relationship, or friendship, between the corresponding users. In addition, we
assume that each node carries a label containing a user’s attributes. An
example of social network is illustrated in Figure 1(a).
Releasing social network data may allow three different types of
attacks: identity, link, and content disclosure [ 11], which will be discussed below. To keep the discussion simple, we do not formally characterize these attacks (see Zhou et al. [ 18] for a formal presentation).
The identity disclosure problem occurs when published social network
data may allow the identity of social network users to be revealed. A
straightforward solution to thwart identity disclosure is
de-identification, that is, removing the user ID, or replacing it with a pseudonym.
Unfortunately, this simple process is not sufficient to prevent identity disclosure when the data recipient possesses some form of background knowledge, used in conjunction with the graph structure to
discover the identity of individuals. As demonstrated by Hay et al. [ 6],
the risk of identifying a user depends on the structural similarity of the
nodes in the released graph, and the attacker’s background knowledge.
More specifically, Hay et al. [ 6] assumed that an attacker knows a
user’s ID, and his or her node’s degree—the number of this user’s
friends—or the existence of a specific type of subgraph around him or
her. Such information may be available through sources external to the
released data or obtained by intruding into the network.
Consider, for example, the graph shown in Figure 1(a) and an
attacker knowing that Mary has four friends. Using this knowledge,
the attacker is able to infer that Mary is represented using “ 3” in the
published data of Figure 1(b).