news
Science | DOI: 10.1145/1941487.1941493
Neil Savage
sorting through Photos
Teaching computers to understand pictures could lead
to search engines capable of identifying and organizing
large datasets of visual information.
The nUMBer of photos on the Internet is large and rap- idly getting larger. The pho- to-hosting Web site Flickr uploaded its five-billionth
picture on September 18, 2010, and
Pingdom, a Swedish company that
monitors Internet performance, estimated last year that Facebook is adding photos at a rate of 30 billion a year.
MI T’S VISual DIC TIonary By an TonIo TorralBa, HeC Tor J. Bernal, roB FerGuS anD yaIr WeISS
Such numbers present both a challenge and an opportunity for scientists focused on computer vision. The
challenge lies in figuring out how to
design algorithms that can organize
and retrieve these photos when, to a
computer, a photograph is little more
than a collection of pixels with different light and color values. The opportunity comes from the enormous wealth
of data, both visual and other types,
which researchers can draw on.
“If you think of organizing a photo
collection purely by the image content,
it’s sort of a daunting task, because
understanding the content of photos is a difficult task in computer science,” says Jon Kleinberg, professor
of computer science at Cornell University. People look at two-dimensional
pictures and immediately conjure a
mental three-dimensional image, easily identifying not only the objects in
mIt’s Visual Dictionary project, which is creating a large dataset of labeled images, relies on
humans’ ability to recognize images even when they are just 32 x 32 pixels.
the picture but their relative sizes, any
interactions between the objects, and
even broad understandings of the season, time of day, or rough location of
the scene. When computers look at a
photo, “they’re seeing it as a huge collection of points drawn on a plane,”
notes Kleinberg.
But scientists are finding ways to
extract the hidden information, using
machine learning algorithms to first
identify objects and then to uncover
relationships between them, and by relying on hints provided by users’ photo
tags, associated text, and other relationships between different pictures.