energy savings with GLsL on Tegra 3.
openCV function
( 10,000 iterations)
median blur
planar warper
warpPerspective
cylindrical warper
blur3x3
warpAffine
energy
savings
3. 43
6. 25
6. 45
3. 89
3. 60
15. 38
approach, as can be deduced from
ARM’s original name, Advanced Risc
Machines. While x86 processors were
traditionally designed for high computing power, ARM processors were
designed primarily for low-power usage, which is a clear benefit for battery-powered devices. As Intel is reducing
power usage in its Atom family for mobile devices, and recent ARM designs
are getting increasingly powerful, they
may in the future reach a similar design point, at least on the high end of
mobile computing devices. Both Tegra
2 and Tegra 3 use ARM Cortex-A9 CPUs.
Mobile phones used to have only
one CPU, but modern mobile SoCs are
beginning to sport several, providing
symmetric multiprocessing. The reason is the potential for energy savings.
One can reach roughly a similar level
of performance using two cores running at 1GHz each than with one core
running at 2GHz. Since the power consumption increases super-linearly with
the clock speed, however, these two
slower cores together consume less
power than the single faster core. Tegra
2 provides two ARM cores, while Tegra
3 provides four. Tegra 3 actually contains five (four plus one) cores, out of
which one, two, three, or four cores can
be active at the same time. One of the
cores, known as the shadow or companion core, is designed to use particularly little energy but can run only
at relatively slow speeds. That mode is
sufficient for standby, listening to music, voice calls, and other applications
that rely on dedicated hardware such
as the audio codec and require only a
few CPU cycles. When more processing
power is needed (for example, reading email), the slower core is replaced
by one of the faster cores, and for increased performance (browsing, gaming) additional cores kick in.
SIMD (single instruction, multiple
data) processing is particularly useful
for pixel data, as the same instruction can be used on multiple pixels
simultaneously. SSE is Intel’s SIMD
technology, which exists on all modern x86 chips. ARM has a similar technology called NEON, which is an optional coprocessor in the Cortex A9.
The NEON can process up to eight,
and sometimes even 16 pixels at the
same time, while the CPU can process only one element at a time. This
is very attractive for computer-vision
developers, as it is often easy to obtain three to four times performance
speedup—and with careful optimization even more than six times. Tegra 2
did not include the NEON extension,
but each of Tegra 3’s ARM cores has a
NEON coprocessor.
All modern smart phones include
a GPU. The first generation of mobile
GPUs implemented the fixed-function-ality graphics pipeline of OpenGL ES
1.0 and 1. 1. Even though the GPUs were
designed for 3D graphics, they could
be used for a limited class of image-processing operations such as warping and blending. The current mobile
GPUs are much more flexible and support OpenGL shading language (GLSL)
programming with the OpenGL ES 2.0
API, allowing programmers to run fairly complicated shaders at each pixel.
Thus, many old-school GPGPU tricks
developed for desktop GPUs about 10
years ago can now be reused on mobile
devices. The more flexible GPU computing languages such as CUDA and
OpenCL will replace those tricks in the
coming years but are not available yet.
Consumption and creation of audio
and video content is an important use
case on modern mobile devices. To sup-
port them, smartphones contain dedi-
cated hardware encoders and decoders
both for audio and video. Additionally,
many devices have a special ISP (image
signal processor) that processes the
pixels streaming out from the camera.
These media accelerators are not as eas-
ily accessible and useful for computer-
vision processing, but the OpenMAX
standard helps.
1 OpenMAX defines
three different layers: AL (application),
IL (integration), and DL (development).
The lowest, DL, specifies a set of primi-
tive functions from five domains: au-
dio/video/image coding and image/
signal processing. Some of them are of
potential interest for computer-vision
developers, especially video coding and
image processing, because they provide
a number of simple filters, color space
conversions, and arithmetic opera-
tions. IL is meant for system program-
mers for implementing the multimedia
framework and provides tools such as
for camera control. AL is meant for ap-
plication developers and provides high-
level abstractions and objects such
as Camera, Media Player, and Media
Recorder. The OpenMAX APIs are use-
ful for passing image data efficiently
between the various accelerators and
other APIs such as OpenGL ES.
openCV on Tegra
A major design and implementation
goal for OpenCV has always been high
performance. Porting both OpenCV
and applications to mobile devices
requires care, however, to retain a sufficient level of performance. OpenCV
has been available on Android since
the Google Summer of Code 2010 when
it was first built and run on Google
Nexus One. Several demo applications
illustrated almost real-time behavior,
but it was obvious that OpenCV needed optimization and fine-tuning for
mobile hardware.