bors in the network. Given a raw set of
geotagged photos, however, we do not
know which are accurate.
To overcome this problem we have
developed a new technique that uses
the image network to combine these
position estimates in a more intelligent, robust way, “averaging out” errors in the noisy observations by passing geometric information between
nodes in the image network. This algorithm uses a message-passing strategy
based on an technique known as loopy
belief propagation commonly used in
machine learning, computer vision,
and other areas.
18 This algorithm is
scalable and can find good solutions
to very nonlinear problems. While
complex, our algorithm starts with the
simple idea that each image should repeatedly average its location with that
of its neighbors, hence using the graph
to smooth noisy location estimates. Because of the extreme noise, this simple
averaging approach doesn’t work well;
therefore, we developed a more sophisticated approach.
9
The message-passing process described here repeats for a number
of rounds, so each image repeatedly
updates its position based on information from its neighbors. This algorithm results in fairly accurate camera
positions, and applying standard optimization techniques (such as gradient descent) using these positions as
a starting point can yield further improvements. With this algorithm we
have built some very large 3D models,
including the reconstructions of the
city of Dubrovnik and parts of Rome
shown in Figure 9. To process these
large problems, we implemented
the algorithm using the MapReduce
framework and ran these as jobs on a
large Hadoop cluster. (For more information, see our project’s Web page10).
In other work on the 3D modeling
problem, we reconstructed all of the
major sites in Rome from hundreds of
thousands of Flickr photos in less than
24 hours (thus reconstructing “Rome
in a Day”).
3, 19
While photo-sharing sites such as
Flickr and Facebook continue to grow
at a breathtaking pace, they still do
not have enough images to reach our
eventual goal of reconstructing the
entire world in 3D. The main problem
is that the geospatial distribution of
figure 10. PhotoCity.
photographs is highly nonuniform, as
noted in the previous section—there
are hundreds of thousands of photos
of Notre Dame but virtually none of the
café across the street.
One solution to this problem is to
entice people to take photos of underrepresented places through
gamifica-tion. This is the idea behind PhotoCity,
an online game developed in collaboration with the University of Washington. In PhotoCity, teams of players
compete against one another by taking
photos at specific points in space to
capture flags and buildings.
22 Through
this game, we collected more than
100,000 photos of the Cornell and University of Washington campuses over a
period of a few weeks. We used these
photos to reconstruct large portions
of the two campuses, including areas that otherwise did not have much
photographic coverage on sites such
as Flickr. A few example building models created from these photos, along
with a screenshot of the PhotoCity
interface, are shown in Figure 10. On
the left is a screenshot of the PhotoCity interface showing an overhead map
depicting the state of the game. On the
right are a few 3D models created from
photos uploaded by players.
Creating a successful game involved two key challenges: building a
robust online system for users to upload photos for processing; and designing the game mechanics in such a
way that users were excited about playing. To address the first challenge, we
built a version of our 3D reconstruction algorithm that could take a new
photo of a building and quickly integrate it into our current 3D model of
that building, updating that model
with any new information contributed
by that photo.
For the second challenge of design-
ing effective game mechanics, we de-
veloped a mix of incentives. One set
of incentives involved competition at
different levels (for example, between
students at the same school, as well
as a race for each school to build the
best model). Another set involved giv-
ing each player visual feedback about
how much he or she contributed to the
model, by showing 3D points created
by that player’s photos and by updat-
ing models so that players could see
the progress of the game as a whole
over time. A survey of players after
the conclusion of the competition re-
vealed that different players were mo-
tivated by different incentives; some
were driven by competition, while oth-
ers simply enjoyed seeing the virtual
world grow over time.
future Work
This article has presented some of our
initial work into unlocking the information latent in large photo-sharing
Web sites using network-analysis algorithms, but the true promise of this
type of analysis is yet to be realized. The
opportunities for future work in this
area lie along two different lines. First,
new algorithms are needed to extract
visual content more efficiently and
accurately: the algorithms presented
here produce incorrect results on some
specific types of scenes, for example,
and they are relatively compute-intensive, requiring many hours on large
clusters of computers to process just a
few thousand images.
Second, this type of analysis could
be applied to other disciplines. Many
scientists are interested in studying
the world and how it has changed
over time, including archaeologists,
architects, art historians, ecologists,
urban planners, and so on. As a specific example, the 3D reconstruction