maximal probability.) See Steyvers and
Griffiths33 for a good description of
Gibbs sampling for LDA, and see http://
CRAN.R-project.org/package=lda for a
fast open-source implementation.
Variational methods are a deterministic alternative to sampling-based
algorithms. 22, 35 Rather than approximating the posterior with samples,
variational methods posit a param-eterized family of distributions over
the hidden structure and then find the
member of that family that is closest
to the posterior.g Thus, the inference
problem is transformed to an optimization problem. Variational methods open the door for innovations in
optimization to have practical impact
in probabilistic modeling. See Blei
et al. 8 for a coordinate ascent variational inference algorithm for LDA;
see Hoffman et al. 20 for a much faster
online algorithm (and open-source
software) that easily handles millions
of documents and can accommodate
streaming collections of text.
Loosely speaking, both types of
algorithms perform a search over the
topic structure. A collection of documents (the observed random variables
in the model) are held fixed and serve
as a guide toward where to search.
Which approach is better depends
on the particular topic model being
used—we have so far focused on LDA,
but see below for other topic models—
and is a source of academic debate. For
a good discussion of the merits and
drawbacks of both, see Asuncion et al. 1
Research in topic modeling
The simple LDA model provides a powerful tool for discovering and exploiting the hidden thematic structure in
large archives of text. However, one of
the main advantages of formulating
LDA as a probabilistic model is that it
can easily be used as a module in more
complicated models for more complicated goals. Since its introduction,
LDA has been extended and adapted
in many ways.
relaxing the assumptions of
Lda. LDA is defined by the statistical assumptions it makes about the
one direction for
topic modeling
is to develop
evaluation methods
that match how
the algorithms
are used.
how can we
compare topic
models based on
how interpretable
they are?
corpus. One active area of topic modeling research is how to relax and extend
these assumptions to uncover more
sophisticated structure in the texts.
One assumption that LDA makes is
the “bag of words” assumption, that
the order of the words in the document
does not matter. (To see this, note that
the joint distribution of Equation 1
remains invariant to permutation of
the words of the documents.) While
this assumption is unrealistic, it is reasonable if our only goal is to uncover
the course semantic structure of the
texts.h For more sophisticated goals—
such as language generation—it is
patently not appropriate. There have
been a number of extensions to LDA
that model words nonexchangeably.
For example, Wallach36 developed a
topic model that relaxes the bag of
words assumption by assuming that
the topics generate words conditional
on the previous word; Griffiths et al. 18
developed a topic model that switches
between LDA and a standard HMM.
These models expand the parameter
space significantly but show improved
language modeling performance.
Another assumption is that the
order of documents does not matter.
Again, this can be seen by noticing
that Equation 1 remains invariant
to permutations of the ordering of
documents in the collection. This
assumption may be unrealistic when
analyzing long-running collections
that span years or centuries. In such
collections, we may want to assume
that the topics change over time.
One approach to this problem is the
dynamic topic model5—a model that
respects the ordering of the documents and gives a richer posterior
topical structure than LDA. Figure 5
shows a topic that results from analyzing all of Science magazine under the
dynamic topic model. Rather than a
single distribution over words, a topic
is now a sequence of distributions
over words. We can find an underlying
theme of the collection and track how
it has changed over time.
A third assumption about LDA is
that the number of topics is assumed
g Closeness is measured with Kullback–Leibler
divergence, an information theoretic measure-
ment of the distance between two probability
distributions.
h As a thought experiment, imagine shuffling
the words of the article in Figure 1. Even when
shuffled, you would be able to glean that the
article has something to do with genetics.