structure provides an intuitive language for expressing
domain-specific knowledge about variable relationships
and facilitates the transfer of modeling advances to new
applications.
Several different formalisms have been proposed that
use graphs to represent probability distributions. 28, 30, 47, 50
Directed graphical models, or Bayesian networks, are widely
used in artificial intelligence to encode causal, generative
processes. Such directed graphs provided the basis for the
earliest versions of the BP algorithm. 37 Alternatively,
undirected graphical models, or Markov random fields (MRFs),
provide popular models for the symmetric dependencies
arising in such areas as signal processing, spatial statistics,
bioinformatics, and computer vision.
2. 1. Pairwise markov random fields
An undirected graph G is defined by a set of nodes V and a
corresponding set of undirected edges E (see Figure 1). Let
Γ(i ) ∆ = { j | (i, j ) Î E} denote the neighborhood of a node i Î V.
MRFs associate each node i Î V with an unobserved, or hidden, random variable xi Î Xi. Let x = {xi | i Î V} denote the
set of all hidden variables. Given evidence or observations y,
a pairwise MRF represents the posterior distribution p(x | y)
in factored form:
Here, the proportionality sign indicates that p(x, y) may
only be known up to an uncertain normalization constant,
chosen so that it integrates to one. The positive potential
functions ψij(xi, xj) > 0 can often be interpreted as soft, local
constraints. Note that the local evidence potential ψi(xi, y)
does not typically equal the marginal distribution p(xi | y),
due to interactions with other potentials.
2. 2. Belief propagation
For graphs that are acyclic or tree-structured, the desired
conditional distributions p(xi | y) can be directly calculated
by a local message-passing algorithm known as belief propagation (BP). 37, 50 At each iteration of the BP algorithm, nodes
j Î V calculate messages mji(xi) to be sent to each neighboring node i Î G( j ):
The outgoing message is a positive function defined on Xi.
Intuitively, it is a (possibly approximate) sufficient statistic
of the information that node j has collected regarding xi. At
any iteration, each node can produce an approximation qi(xi)
to the marginal distribution p(xi | y) by combining incoming
messages with the local evidence potential:
These updates are graphically summarized in Figure 2.
For tree-structured graphs, the approximate marginals,
or beliefs, qi (xi) will converge to the true marginals p(xi | y)
once messages from each node have propagated across
the graph. With an efficient update schedule, the messages for each distinct edge need only be computed once,
and BP can be seen as a distributed variant of dynamic
programming.
figure 2. message-passing recursions underlying the BP algorithm. Left: approximate marginal (belief) estimates combine the local
observation potential with messages from neighboring nodes. Right: a new outgoing message (red) is computed from all other incoming
messages (blue).
y
y
qi (xi) ∝ ψi(xi, y) Π
xi
xi
xj
mji(xi) ∝ ÚXj ψij(xi, xj)ψj(xj, y) Π k∈Γ( j)\ imkj(xj) dxj