raphy—flower, family, sunset, black
and white, among others—we can find
place names by looking across the pho-
tos of millions of users and finding tags
that are used frequently in a particular
place and infrequently outside of it. We
can also generate a visual description
of each place by finding a representa-
tive image that summarizes that place
well. To do this, we deem each photo-
graph taken in a place as a vote for the
most interesting viewpoint at that loca-
tion. Intuitively, we then try to find the
viewpoint that receives the most votes
by looking for groups of photos that are
visually similar and taken by many dif-
ferent users.
figure 6. Photo network for finding representative images.
figure 7. Trails of human movement in Manhattan.
and we connect pairs of photos having a high degree of visual similarity.
Then we apply a graph-clustering algorithm to find tightly connected components of the graph (that is, groups
of nodes that are connected to many
other nodes within the group but not
to many nodes outside the graph) and
choose one of these photos as a representative image. A sample graph of this
type is shown in Figure 6. To decide
which nodes to connect, we measure
visual similarity using an automated
technique called SIFT (scale-invariant
feature transform) feature matching,
14
illustrated in Figure 5. Note that this
summary image is not necessarily the
best photo of a particular place—it
will likely be a canonical tourist photo
rather than a more unusual yet captivating viewpoint captured by a professional photographer.
The map in Figure 1 was produced
completely automatically using this
analysis on tens of millions of images
downloaded from Flickr. Starting with
a blank slate, we plotted the raw photo geotags to produce the map in the
background and then applied mean-shift clustering to locate the 30 most
photographed cities on Earth. For each
of those cities, we extracted the city’s
name by looking for distinctive text
tags and found the name of the most
photographed landmark within the
city. Then we extracted a representative image for that landmark. While
the analysis is not perfect—a human
would have chosen a more appropriate
image of Phoenix than a bird on a baseball field, for example—the result is a
compelling summary of North America, produced automatically by analyzing the activity of millions of Flickr
users. Maps for other continents, regions, and cities of the world are available at our project Web site.
8
This analysis is reminiscent of sociologist Stanley Milgram’s work during
the 1970s studying people’s “
psychological maps”—their mental images
of how the physical world is laid out.
17
He asked Parisians to draw freehand
maps of their city and then compared
these maps with the factual geography.
Milgram found that the maps were
highly variable and largely inaccurate
but that most people tended to anchor
their maps around a few key landmarks
such as the Seine River and Notre