that messages mji(xi) are approximated by a set of weighted,
discrete samples. If Xi is continuous and these messages are
constructed from independent proposal distributions, their
particles will be distinct with probability one. For the message product operation underlying the BP algorithm to produce sensible results, some interpolation of these samples
to nearby states is thus needed.
We accomplish this interpolation, and ensure that messages are smooth and strictly positive, by convolving raw
particle sets with a Gaussian distribution, or kernel:
Here, N(x; m, L) denotes a normalized Gaussian density
with mean m and covariance L, evaluated at x. As detailed
later, we use methods from the nonparametric density
estimation literature to construct these mixture approximations. 42 The product of two Gaussians is itself a scaled
Gaussian distribution, a fact which simplifies our later
algorithms.
3. 2. message fusion
We begin by assuming that the observation potential is a
Gaussian mixture. Such representations arise naturally
from learning-based approaches to model identification. 14
The BP belief update of Equation 3 is then defined by a
product of d = (|G(i)| + 1) mixtures: the observation potential ψi(xi, y), and messages mji(xi) as in Equation 9 from
each neighbor. As illustrated in Figure 3, the product of d
Gaussian mixtures, each containing L components, is itself
a mixture of Ld Gaussians. While in principle this belief
update could be performed exactly, the exponential growth
in the number of mixture components quickly becomes
intractable.
The NBP algorithm instead approximates the product
mixture via a collection of L independent, possibly importance weighted samples from the “ideal”
belief of Equation 3. Given these samples, the bandwidth
Li of the nonparametric belief estimate (Equation 10) is
determined via a method from the extensive kernel density estimation literature. 42 For example, the simple “rule
of thumb” method combines a robust covariance estimate
with an asymptotic formula that assumes the target density
is Gaussian. While fast to compute, it often oversmooths
figure 3. a product of three mixtures of L = 4 1D Gaussians. although
the 43 = 64 components in the product density (thin lines) vary widely
in position and weight (scaled for visibility), their normalized sum
(thick line) has a simple form.
multimodal distributions. In such cases, more sophisticated cross-validation schemes can improve performance.
In many applications, NBP’s computations are dominated by the cost of sampling from such products of
Gaussian mixtures. Exact sampling by explicit construction of the product distribution requires O(Ld) operations.
Fortunately, a number of efficient approximate samplers
have been developed. One simple but sometimes effective approach uses an evenly weighted mixture of the d
input distributions as an importance sampling proposal.
For higher-dimensional variables, iterative Gibbs sampling algorithms are often more effective. 44 Multiscale
KD-tree density representations can improve the mixing
rate of Gibbs samplers, and also lead to “epsilon-exact”
samplers with accuracy guarantees. 25 More sophisticated
importance samplers5 and multiscale simulated or parallel tempering algorithms39 can also be highly effective.
Yet more approaches improve efficiency by introducing
an additional message approximation step. 19, 22, 31 By first
reducing the complexity of each message, the product can
be approximated more quickly, or even computed exactly.
When ψi(xi, y) is a non-Gaussian analytic function, we can
use any of these samplers to construct an importance
sampling proposal from the incoming Gaussian mixture
messages.
3. 3. message propagation
The particle filter of Section 2. 4 propagates belief estimates
to subsequent time steps by sampling .
The consistency of this procedure depends critically on the
HMM’s factorization into properly normalized conditional
distributions, so that ∫ p(xt+ 1 | xt)dxt+ 1 = 1 for all xt Î Xt. By definition, such conditional distributions place no biases on xt.
In contrast, for pairwise MRFs, the clique potential ψij(xi, xj)
is an arbitrary nonnegative function that may influence the
values assumed by either linked variable. To account for
this, we quantify the marginal influence of ψij(xi, xj) on xj via
the following function:
If ψij(xi, xj) is a Gaussian mixture, ψij(xj) is simply the mixture
obtained by marginalizing each component. In the common
case where ψij(xi, xj) = ψij(xi − xj) depends only on the difference between neighboring variables, the marginal influence
is constant and may be ignored.