Motion. Frame-to-frame comparison against a learned background
model is an effective and computationally efficient method for finding
foreground objects and for observing
their position and movement. This
comparison requires several assumptions (such as a stationary camera or
image pre-processing to stabilize the
video) and a static background; for example, Kang et al. 21 employed the Lu-cas-Kanade tracking method. 29
Depth. Range data from a calibrated
camera pair40 or direct range sensors
(such as LiDAR) is a particularly useful cue if the user is expected to face
the camera(s) and the hands are considered the closest object. Depth from
stereo is usually coarse-grain and rather noisy, so it is often combined with
other image cues (such as color17, 22, 33).
Well-calibrated stereo cameras are
costly, and depth can be calculated accurately only if the scene contains sufficient texture. If texture is lacking, artificial texture can be projected into the
scene through a digital light projector
injecting structured light patterns. 39
Color. Heads and hands are found
with reasonable accuracy based purely
on their color. 24, 40 Skin color occupies
a rather well-defined area in color
spaces (such as Hue, Saturation, and
Intensity, L*a*b*, and YIQ) so can be
used for segmentation (see Figure 2
and Hasanuzzaman et al., 19 Rogalla et
al., 41 and Yin and Zhu53). Combined
histogram-matching and blob-track-
ing with Camshift7 or the Viterbi al-
gorithm54 is a popular approach due
to its speed, ease of implementation,
and performance. Shortcomings stem
from confusion with similar-colored
objects in the background and limita-
tions with respect to posture recogni-
tion. Better optics and sensors often
improve color saturation, therefore
color-based algorithms; another ac-
curacy boost can be achieved through
user-worn markers (such as colored
gloves and bright LEDs). While sim-
plifying the interface implementa-
tion, these aids do not permit users to
“come as you are,” so IR illumination
can be used instead of markers. The IR
light source illuminates users’ hands,
allowing an IR camera to capture the
images of the illuminated parts. 44 In
addition, reflective material affixed to
a body part can increase the part’s re-
flection properties.
figure 3. Hand-gesture recognition using
color segmentation, conversion into polar
coordinates, and maxima detection to
identify and count fingers.
figure 4. multi-cue hand tracking and
posture recognition. 24