“In the old days we used to do image
matching,” says Nuno Vasconcelos,
head of the Statistical Visual Computing Laboratory at the University of California, San Diego. Computers would
derive some statistical model of an example image and then look for matches with other images. “It works to
some extent,” says Vasconcelos, “but it
doesn’t work very well.” The programs
would find low-level matches, based
on factors such as color or texture. A
beach scene, all sand and sky, might be
matched with a picture of a train, with
an equal amount of sky and a color
similar to sand.
Nowadays, Vasconcelos says, the emphasis is on trying to understand what
the image is about. Starting with a set of
images labeled by humans, a machine
learning algorithm develops a statistical model for an entire class of images.
The computer calculates the probability
that a picture is a beach scene—based
on labels such as “beach,” “sand,”
“ocean,” and “vacation”—and then
matches the picture with other images
with the same probability.
To train such algorithms, scientists need large sets of labeled images.
While datasets with a few thousand
photos exist, the algorithms become
more accurate with much larger sets.
Fei-Fei Li, an assistant professor at the
Stanford Vision Lab, starting developing such a dataset in ImageNet, along
with Kai Li, a computer scientist at
Princeton University.
They started with WordNet, a hierarchical database of English words in
which distinct concepts are grouped
into sets of synonyms called synsets;
there are 80,000 synsets just for nouns.
The researchers entered each of the
synonyms into Internet search engines
to collect about 10,000 candidate images per synset. Then, using labor provided by Amazon Mechanical Turk, in
which people earn small payments for
tasks that require human input, they
had people verify whether a candidate
image contained the object listed in
the synset. The goal is to have 500 to
1,000 images per synset. So far, they’ve
amassed more than 11 million labeled
images in about 15,500 categories, putting them between a third and halfway
toward their goal.
About 100 people participated in
the ImageNet Challenge last summer
to see if they could use the dataset to
train computers to recognize objects
in 1,000 different categories, from
“French fries” to “Japanese pagoda
tree.” Once the computers have shown
they can identify objects, Fei-Fei says
the next objective will be to recognize
associations between those objects.
Noticing context can aid in object rec-
ognition, she explains. “If we see a car
on the road, we don’t keep thinking ‘Is
it a boat? Is it an airplane?’ ”
using human Recognition
The Visual Dictionary project at the
Massachusetts Institute of Technology (MIT) also seeks to develop a large
dataset of labeled images, but relies
on the fact that humans can recognize
images even when they’re only 32 × 32
pixels. A Web page displays a mosaic
representing 7. 5 million images associated with 53,464 terms, with closely
related words placed near each other
on the mosaic. Each tile on the mosaic
shows the average color of all the pictures found for that term, and clicking
on it displays a box containing a definition and a dozen associated images. As
people click on each tiny picture to verify that it matches the word, the computer records those labels. In another
MIT project, LabelMe, the labeling gets
even more specific, identifying not just
a person, but heads, legs, and torsos, as
well as roads, cars, doors, and so on.
The small size of these photos helps
keep down the demand on computing
capacity, but it also reveals something
Milestones
Ben Franklin Medal and Other CS Awards
the franklin Institute, anita
Borg Institute for Women
and technology, and other
organizations recently honored
leading computer scientists for
their research and leadership
qualities.
Ben fRanKLIn meDaL
the franklin Institute presented
the 2011 Benjamin franklin
Medal in Computer and Cognitive
science to John Anderson, r.
k. Mellon University Professor
of Psychology and Computer
science at Carnegie Mellon
University, for the development
of adaptive Control of thought.
his work reflects the first large-
scale computational theory of
the process by which humans
perceive, learn, and reason,
and its application to computer
tutoring systems.
Women of VIsIon a WaRDs
the anita Borg Institute
presented its 2011 Women of
Vision awards. Chieko Asakawa,
an IBM fellow at IBM research-tokyo, was honored with the
Leadership award; Mary Lou
Jepsen, Ceo of Pixel Qi, the
Innovation award; Karen Panetta,
professor of electrical and
computer engineering at tufts
University, the social Impact
award; and IBM received the
anita Borg top Company for
technical Women award.
CRa a WaRDs
Computing research association
(Cra) board of directors selected
Charles Lickel, retired executive
vice president, IBM, to receive
the 2011 a. nico habermann
award for his accomplishments
at the national, local, and
individual levels for increasing
underrepresented groups, and
particularly for researchers
in the gay, lesbian, bisexual,
and transgendered computing
community. the Cra Board of
Directors also selected Jeannette
M. Wing, President’s Professor
of Computer science and head,
Computer science Department,
Carnegie Mellon University, to
receive the 2011 Distinguished
service award for her national
and international thought
leadership with respect to
Computational thinking, and for
her extraordinary performance
as national science foundation
assistant Director for Computer
and Information science and
engineering from 2007–2010.
aCm sIGChI a WaRDs
the aCM special Interest Group
on Computer human Interaction
presented the Lifetime Practice
award, which recognizes the
very best and most influential
applications of human-computer
interaction, to Larry Tesler, an
independent consultant, for
his “work at Xerox ParC and
apple [which] has impacted
literally every computer user
today.” the Lifetime research
award was presented to Terry
Winograd, a computer science
professor at stanford University,
for “fundamental contributions
to the design of interactive
computer systems by taking a
broad view of hCI, considering it
in the context of natural language
processing, machine and human
intelligence, cognitive science,
human-machine communication,
design, and software design.”
—Jack Rosenberger