With the GPU version of cv::resize
we were able to decrease scaling time
from 41 milliseconds to 26 milliseconds for each input frame, which is
equal to 1. 6 times local speedup. Because of the GPU implementation of
image warping, we could achieve even
better local improvements—a boost of
8–14 times in performance, depending
on the projection type. As a result, total
application speedup was 1. 5–2.0 times,
meeting performance requirements.
Video stabilization. One of the negative consequences of recording video
without a tripod is camera shake,
which significantly degrades the
viewing experience. To achieve visually pleasant results, all movements
should be smooth, and the high-frequency variations in camera orientation and translation must be filtered.
Numerous approaches have been
developed, some have become open
source or commercially available
tools. There exist computationally intensive approaches offline that take a
considerable amount of time, while
the lightweight online algorithms
are more suitable for mobile devices.
High-end approaches often reconstruct the 3D movement of the camera and apply sophisticated nonrigid
image warping to stabilize the video.
2
On mobile devices more lightweight
approaches using translation, affine
warping, or planar perspective transformations may make more sense.
3
We experimented with translation
and affine models, and in both cases
the GPU was able to eliminate the
major hotspot, which was the application of the compensating transformation to an input frame. Applying
translation to compensate for the motion simply means shifting the input
frame along the x and y axes and cutting off some of the boundary areas
for which some of the frames now do
not contain color information (see
Figure 10).
In terms of programming, one
should choose a properly located submatrix and then resize it into a new
image at the same resolution as the
original video stream, as suggested
in Figure 11. Surprisingly, this simple
step consumed more than 140 milliseconds. Our GPU GLSL implementation was five to six times faster than
C++ and took about 25 milliseconds.
Nevertheless, 25 milliseconds is
still too long for a real-time algorithm,
which is why we next tried to obtain
more speed from asynchronous calls.
A special class was created for stabilizing frames on the GPU. This class
immediately returns a result from the
previous iteration stored in its image-buffer field and creates a TBB::task
for processing the next frame. As a
result, GPU processing is performed
in the background, and the apparent
cost and delay for the caller is equal
to just copying a full frame. This trick
was also applied to an expensive col-or-conversion procedure, and with
further optimizations of the memory-access patterns, we achieved real-time
processing performance.
future Directions
GPUs were originally developed to accelerate the conversion of 3D scene
descriptions into 2D images at interactive rates, but as they have become
more programmable and flexible, they
have also been used for the inverse task
of processing and analyzing 2D images
and image streams to create a 3D description, to control some application
so it can react to the user or events in
the environment, or simply to create
higher-quality images or videos. As
computer-vision applications become
more commonplace, it will be interesting to see whether a different type of
computer-vision processor that would
be even more suitable for image processing is created to work with a GPU,
or whether the GPU remains suitable
even for this task. The current mobile
GPUs are not yet as flexible as those on
larger computers, but this will change
soon enough.
OpenCV (and other related APIs
such as Point Cloud Library) have
made it easier for application develop-
ers to use computer vision. They are
well-documented and vibrant open
source projects that keep growing,
and they are being adapted to new
computing technologies. Examples of
this evolution are the transition from
a C to a C++ API in OpenCV and the ap-
pearance of the OpenCV GPU module.
The basic OpenCV architecture, how-
ever, was designed mostly with CPUs
in mind. Maybe it is time to design a
new API that explicitly takes heteroge-
neous multiprocessing into account,
where the main program may run on
a CPU or several CPUs, while major
parts of the vision API run on differ-
ent types of hardware: a GPU, a DSP
(digital signal processor), or even a
dedicated vision processor. In fact,
Khronos has recently started working
on such an API, which could work as
an abstraction layer that allows inno-
vation independently on the hardware
side and allows for high-level APIs
such as OpenCV to be developed on
top of this layer while being somewhat
insulated from the changes in the un-
derlying hardware architecture.
Acknowledgments
We thank Colin Tracey and Marina
Kolpakova for help with power analysis; Andrey Pavlenko and Andrey Ka-maev for GLSL and NEON code; and
Shalini Gupta, Shervin Emami, and Michael Stewart for additional comments.
NVIDIA provided support, including
hardware used in the experiments.
References
1. khronos openMaX standard; http://www.khronos.org/
openmax.
2. liu, F., Gleicher, M., Wang, J., Jin, H., agar wala, a.
subspace video stabilization. ACM Transactions on
Graphics 30, 1 (2011), 4:1–4: 10.
3. Matsushita, y., ofek, e., Ge, W., tang, X., shum, H.-y.
Full-frame video stabiliza¬tion with motion inpainting.
IEEE Transactions on Pattern Analysis and Machine
Intelligence 28, 7 (2006), 1150–1163.
4. newcombe, r.a., Izadi, s. et al. kinectfusion: real-time dense surface mapping and tracking. IEEE
International Symposium on Mixed and Augmented
Reality (2011), 127–136.
5. openCV library; http://code.opencv.org.
6. Point Cloud library; http://pointclouds.org.
7. szeliski, r. Image alignment and stitching: a tutorial.
Foundations and Trends in Computer Graphics and
Vision 2, 1 (2006), 1–104.
8. Willow Garage. robot operating system; http://www.
ros.org/wiki/.
Kari Pulli is a senior director at nVIdIa research, where
he heads the Mobile Visual Computing research team and
works on topics related to cameras, imaging, and vision
on mobile devices. He has worked on standardizing mobile
media aPIs at khronos and JCP and was technical lead of
the digital Michelangelo Project at stanford university.
Anatoly Baksheev is a project manager at Itseez. He
started his career there in 2006 and was the principal
developer of multi-projector system argus Planetarium.
since 2010 he has been the leader of the openCV GPu
project. since 2011 he has work on the GPu acceleration
module for Point Cloud library.
Kirill Kornyakov is a project manager at Itseez, where
he leads the development of openCV library for mobile
devices. He manages activities on mobile operating-system support and computer-vision applications
development, including performance optimization for
nVIdIa tegra platform.
Victor Eruhimov is Cto of Itseez. Prior to co-founding
the company, he worked as a project manager and senior
research scientist at Intel, where he applied computer-vision and machine-learning methods to automate Intel
fabs and revolutionize data processing in semiconductor
manufacturing.
© 2012 aCM 0001-0782/12/06 $10.00