and then finds the best matches, as
previously described. In areas where
there is little texture—for example, a
blank wall—the calculated matches
are unreliable, so all such areas are
marked to be ignored in later processing. As the disparity values are expected to change significantly near object
borders, the speckle-filtering stage
eliminates speckle noise within large
continuous regions of disparity image.
Unfortunately, the speckle-filtering algorithm requires a stack-based depth-first search difficult to parallelize, so it
is run on the CPU. The results are visualized using a false-color image.
All the steps except speckle filtering are implemented on the GPU. The
most compute-intensive step is block
matching. NVIDIA GTX580 has accelerated it seven times faster than a CPU
implementation on a quad core Intel
i5-760 2.8GHz processor with SSE and
TBB optimizations. After this speedup
the speckle filtering becomes the bottleneck, consuming 50% of the frame-processing time.
An elegant parallel-processing solution is to run speckle filtering on the
CPU in parallel with the GPU processing. While the GPU processes the next
frame, the CPU performs speckle filtering for the current frame. This can
be done using asynchronous OpenCV
GPU and CUDA capabilities. The heterogeneous CPU/GPU system now
provides a sevenfold speedup for the
high-resolution stereo correspondence
problem, allowing real-time (24fps)
performance at full HD resolution.
kinectfusion
Microsoft’s KinectFusion4 is an example of an application that previously
required slow batch processing but
now, when powered by GPUs, can be
run at interactive speeds. Kinect is a
camera that produces color and depth
images. Just by aiming the Kinect device around, one can digitize the 3D
geometry of indoor scenes at an amazing fidelity, as illustrated in Figure 5.
An open source implementation of
such a scanning application is based
on the Point Cloud Library,
6 a companion library to OpenCV that uses 3D
points and voxels instead of 2D pixels
as basic primitives.
Implementing KinectFusion is not
a simple task. Kinect does not return
range measurements for all the pix-
els, and it works reliably only on con-
tinuous smooth matte surfaces. The
range measurements that it returns
are noisy, and depending on the sur-
face shapes and reflectance properties,
the noise can be significant. The noise
also increases with the distance to the
measured surface. Kinect generates a
new depth frame 30 times in a second.
If the user moves the Kinect device
too fast, the algorithm gets confused
and cannot track the motion using the
range data. With a clever combination
of good algorithms and using the pro-
cessing power provided by GPUs, how-
ever, KinectFusion works robustly.
Mobile Devices
While PCs are often built with a CPU
and a GPU on separate chips, mobile
devices such as smartphones and tablets put all the computing elements
on a single chip. Such an SoC (system
on chip) contains one or more CPUs, a
GPU, as well as several signal processors
for audio and video processing and data
communication. All modern smartphones and some tablets also contain
one or more cameras, and OpenCV is
available on both Android and iOS operating systems. With all these components, it is possible to create mobile
vision applications. The following sections look at the mobile hardware in
more detail, using NVIDIA’s Tegra 2 and
Tegra 3 SoCs as examples, and then introduce several useful multimedia APIs.
Finally, two mobile vision applications
are presented: panorama creation and
video stabilization.
Tools for Mobile Computer Vision
At the core of any general-purpose
computer is the CPU. While Intel’s
x86 instruction set rules on desktop
computers, almost all mobile phones
and tablets are powered by CPUs from
ARM. ARM processors follow the RISC
(reduced instruction set computing)