figure 5. Detection of a toy, four-armed articulated object (top
row) in clutter. We show nBP estimates after 0, 1, and 3 iterations
(columns), for cases where the circular central joint is either visible
(middle row) or occluded (bottom row).
may take any 3D position and orientation. 48 The graphical models in Figure 6 instead encode hand pose via the
3D pose of 16 rigid bodies. 45 Analytic pairwise potentials
then capture kinematic constraints (phalanges are connected by revolute joints), structural constraints (two
fingers cannot occupy the same 3D volume), and Markov
temporal dynamics. The geometry of individual rigid bodies is modeled via quadric surfaces (a standard approach
in computer graphics), and related to observed images via
statistical color and edge cues. 45
Because different fingers are locally similar in appearance, global inference is needed to accurately associate
hand components to image cues. Discretization of the 6D
pose variable for each rigid body is intractable, but as illustrated in Figure 6, NBP’s sampling-based message approximations often lead to accurate hand localization and
tracking. While we project particle outlines to the image
plane for visualization, we emphasize NBP’s estimates are
of 3D pose.
Finally, Figure 7 illustrates a complementary approach
to multicamera tracking of 3D person motion. 41 While the
hand tracker used rigid kinematic potentials, this graphical model of full-body pose is explicitly “loose limbed,”
and uses pairwise potentials estimated from calibrated,
3D motion capture data. Even without the benefit of
dynamical cues or highly accurate image-based likelihoods, we see that NBP successfully infers the full human
body pose.
4. 2. Sensor self-localization
figure 6. articulated 3D hand tracking with nBP. Top: Graphical
models capturing the kinematic, structural, and temporal
constraints relating the hand’s 16 rigid bodies. Middle: Given a
single input image, projected estimates of hand pose after one
(left) and four (right) nBP iterations. Bottom: two frames showing
snapshots of tracking performance from a monocular video
sequence.
is sensor localization. 22 One of the critical first tasks in
using ad-hoc networks of wireless sensors is to determine the location of each sensor; the high cost of manual
calibration or specialized hardware like GPS units makes
self-localization, or estimating position based on local in-network information, very appealing. As with articulated
tracking, we will be estimating the position of a number
of objects (sensors) using joint information about the
objects’ relative positions. Specifically, let us assume that
some subset of pairs of sensors (i, j) Î E are able to measure
a noisy estimate of their relative distance (e.g., through
signal strength of wireless communication or measuring
time delays of acoustic signals). Our measurements yij tell
us something about the relative positions xi, xj of two sensors; assuming independent noise, the likelihood of our
measurements is
We can see immediately that this likelihood has the form
of a pairwise graphical model whose edges are the pairs of
sensors with distance measurements. Typically we assume
a small number of anchor sensors with known or partially
known position to remove translational, rotational, and
mirror image ambiguity from the geometry.