scientists are
extracting the hidden
information in photos
by using machine
learning algorithms
to identify objects and
uncover relationships
between them, and
by relying on users’
photo tags and
associated text.
important about vision, says Antonio
Torralba, an associate professor at
MIT who heads the project. “If humans
are able to do recognition at such low
resolution, there are two possibilities. Either the human visual system is
amazing or the visual world is not that
complex,” he says. In fact, he adds, the
world is complex. “Most of the information is because you know a lot about
the world,” he says.
The fact that much of the semantic
content of a photo is actually supplied
by the human viewing it leads researchers to try to derive clues about the content from what humans do with the
pictures. Kleinberg makes the analogy
with Web search, which not only looks
at the textual content of Web pages, but
also their structure, such as how they
are organized and what hyperlinks they
contain. Kleinberg uses the geotagging
of photos to learn what they’re about,
with the tags supplied either by Flickr
users clicking on the Web site’s map
or by GPS-based tags automatically created by a user’s camera. It turns out that
sorting location tags on a 60-mile scale
identifies population centers, and on a
100-meter scale identifies landmarks—
the things people like to take pictures of.
For each of those scales, Kleinberg
has the computer comb the textual de-
scription looking for the words whose
use peaks most—not the most com-
monly used words but the words that
are used more in one particular geo-
graphic area than any other. For in-
stance, in one area the most outstand-
ing word is Boston. Focusing on that
region at the 100-meter scale finds the
peak term to be Fenway Park. Pictures
so labeled might actually be pictures
of a car outside Fenway Park, or your
dad at the ball game, or a pigeon on
the Green Monster (the left-field wall at
Fenway), but when the computer com-
pares all the labeled photos to find the
biggest cluster that are mutually simi-
lar to each other, a photo of the base-
ball diamond emerges as the typical
image of Fenway Park. “There was no
a priori knowledge built into the algo-
rithm by us,” Kleinberg says.
Further Reading
Deng, J., Dong, W., Socher, R., Li, L.-J.,
Li, K., and Fei-Fei, L.
Imagenet: A large-scale hierarchical image
database, IEEE Conference on Computer
Vision and Pattern Recognition, Miami, FL,
June 20–25, 2009.
Crandall, D., Backstrom, L., Huttenlocher, D.,
and Kleinberg, J.
Mapping the world’s photos, Proceedings
of the 18th International World Wide Web
Conference, Madrid, Spain, April 20–24, 2009.
Hays, J. and Efros, A.A.
IM2GPS: estimating geographic information
from a single image, Proceedings of the
IEEE Conference on Computer Vision and
Pattern Recognition, Anchorage, AK, June
23–28, 2008.
Torralba, A., Fergus, R., and Freeman, W. T.
80 million tiny images: a large dataset for
non-parametric object and scene recognition,
IEEE Transactions on Pattern Analysis and
Machine Intelligence 30, 11, nov. 2008.
Vasconcelos, N.
From pixels to semantic spaces: advances
in content-based image retrieval, IEEE
Computer 40, 7, July 2007.
neil Savage is a science and technology writer based in
lowell, Ma. David a. Patterson, university of California,
Berkeley, contributed to the development of this article.
© 2011 aCM 0001-0782/11/05 $10.00
In Memoriam
David E.
Rumelhart
1942–2011
David e. rumelhart, a pioneer
in computer simulations of
perception, died on March
13 in Chelsea, MI, at the age
of 68 after suffering from
a debilitating neurological
condition.
a psychologist, rumelhart
made many contributions to
the formal analysis of human
cognition, working mainly
within the frameworks of
mathematical psychology,
artificial intelligence, and
parallel distributed processing.
While working at the
University of California, san
Diego, rumelhart developed
a computer simulation of
how three or more layers of
neurons could work together
to process information, which
is necessary for the brain
to perform complex tasks.
this system, which was more
sophisticated than previous
models, was described in a
landmark paper he wrote in
1986 with Geoffrey hinton and
ronald Williams for Nature,
and led to new, more powerful
systems for visual object
recognition and handwritten
character classification.
rumelhart was also well
known for his textbook,
Parallel Distributed
Processing: Explorations in the
Microstructure of Cognition,
written with Jay McClelland,
says hinton, the raymond
reiter Distinguished Professor
of artificial Intelligence in the
computer science department
at the University of toronto.
Parallel Distributed Processing
described the authors’
computer simulations of
perception, and provided the
first testable models of neural
processing. It is regarded as
a central text in the field of
cognitive science.
the robert J. Glushko and
Pamela samuelson foundation
honored rumelhart in 2000
with the creation of the David
e. rumelhart Prize, an annual
award given to an individual
or team making a significant
contemporary contribution to
the theoretical foundations of
human cognition.