understand the topics and inter-topic
relationships among the hundreds of
thousands of grants awarded by it and
sister agencies.
The ultimate application may be
to help understand how the human
mind works. Steyvers is experiment-
ing with topic modeling to shed light
on how humans retrieve words from
memory, based on associations with
other words. He runs the models on
educational documents to produce
crude approximations of the topics
learned by students, then compares
the accuracy of recall, based on word
associations, of the students and mod-
els. Sometimes the models make mis-
takes in their word and topic associa-
tions, which are shedding light on the
memory mistakes of humans. What’s
needed, Steyvers says, is nothing less
than “a model of the human mind.”
Meanwhile, computer scientists are
looking for ways to make algorithms
more efficient and to structure prob-
lems for parallel processing, so that
huge problems, such as topic model-
ing the entire World Wide Web, can be
run on large clusters of computers.
Fernando Pereira, a research director at Google, says a number of experimental systems of probabilistic topic
modeling are being investigated at
the company. The systems could provide better Google search results by
grouping similar terms based on context. A topic model might discover, for
one of the
advantages of the
LDa framework is
the ease with
which one can
define new models.
instance, that a search for the word
“parts,” used in an automobile context,
should include “accessories” when it
is also used in an automobile context.
(The two words are seen as synonyms
if both are used in the same context; in
this case, automobiles.) Google does
some of that now on a limited basis using heuristic models, but they tend to
require a great deal of testing and tuning, Pereira says.
“I can’t point to a major success
yet with the LDA-type models, partly
because the inference is very expen-
sive,” Pereira says. “While they are
intriguing, we haven’t yet gotten to
the point that we can say, ‘Yes, this is
a practical tool.’ ”
But, says Tom Griffiths, director of
the Computational Cognitive Science
Lab at University of California, Berke-
ley, “We are seeing a massive growth in
people applying these models to new
problems. One of the advantages of
this [LDA] framework is it’s pretty easy
to define new models.”
Further Reading
Blei, D. and Lafferty, J.
Dynamic topic models. Proceedings of the
23rd International Conference on Machine
Learning, Pittsburgh, PA, June 25–29, 2006.
Blei, D. and Lafferty, J.
Topic models, Text Mining: Classification,
Clustering, and Applications, (Srivastava,
A. and Sahami, M., Eds), Taylor & Francis,
London, England, 2009.
Chang, J., Boyd-Graber, J., Gerrish,
S., Wang, C., and Blei, D.
Reading tea leaves: how humans interpret
topic models. Twenty-Third Annual
Conference on neural Information
Processing Systems, Vancouver, British
Columbia, Canada, Dec. 7–12, 2009.
Deerwester, S., Dumais, S. T., Furnas, G. W.,
Landauer, T.K., and Harshman, R.
Indexing by latent semantic analysis,
Journal of the American Society for
Information Science 41, 6, 1990.
Newman, D., Chemudugunta, C.,
Smyth, P., and Steyvers, M.
Statistical entity-topic models. The Twelfth
ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining.
Philadelphia, PA, August 23–26, 2006.
Gary Anthes is a technology writer and editor based in
arlington, Va.
© 2010 aCm 0001-0782/10/1200 $10.00
History
Building Babbage’s Analytical Engine
a British programmer and author
who wants to build a computer
based on 19th century designs
has about 3,000 people pledging
donations to his project. John
Graham-Cumming, author of
The Geek Atlas, hopes to build
an analytical engine invented by
english mathematician Charles
Babbage and first proposed in
1837. Babbage is often called “the
father of computing.”
right shift, and comparison/jump
operations. it will be very slow,
with a single addition taking
about 13,000 times as long as on
a Z80, an 8-bit microprocessor
from the mid-1970s.