functions, which include more overhead and many steps that are not easy
to parallelize with a GPU. For example,
the granularity for color conversion is
per-pixel, making it easy to parallelize. Pedestrian detection, on the other
hand, is performed in parallel for each
possible pedestrian location, and parallelizing the processing of each window position is limited by the amount
of on-chip GPU memory.
As an example, we accelerated two
packages from Robot Operation System (ROS)
8—stereo visual odometry
and textured object detection—that
were originally developed for the CPU.
They contain many functional blocks
and a class hierarchy.
Wherever it made sense, we offload-
ed the computations to the GPU. For
example, OpenCV GPU implementa-
tions performed Speeded-Up Robust
Feature (SURF) key point detection,
matching, and search of stereo corre-
spondences (block matching) for ste-
reo visual odometry. The accelerated
packages were a mix of CPU/GPU im-
plementations. As a result, the visual
odometry pipeline was accelerated 2. 7
times, and textured object detection
was accelerated from 1. 5–4 times, as
illustrated in Figure 3. Data-transfer
overhead was not a significant part of
the total algorithm time. This example
shows that replacing only a few lines of
code results in a considerable speedup
of a high-level vision application.
figure 3. Textured object detection application: CPu and GPu.
figure 4. stereo block matching pipeline.
CPu
speckle filtering
GPu
rectification
Matching
low texture filtering
Color and show
figure 5. RGB frame, depth frame, ray-casted frame, and point cloud.
stereo Correspondence
with GPu Module
Stereo correspondence search in a
high-resolution video is a demanding
application that demonstrates how
CPU and GPU computations can be
overlapped. OpenCV’s GPU module
includes an implementation that can
process full HD resolution stereo pair
in real time ( 24 frames per second) on
the NVIDIA GTX580.
In a stereo system, two cameras are
mounted facing in the same direction.
While faraway objects project to the
same image locations on each camera, nearby objects project to different
locations. This is called disparity. By
locating each pixel on the left camera
image where the same surface point
projects to the right image, you can
compute the distance to that surface
point from the disparity. Finding these
correspondences between pixels in
the stereo image pairs is the key challenge in stereo vision.
This task is made easier by rectifying
the images. Rectification warps the images to an ideal stereo pair where each
scene surface point projects to a matching image row. This way, only points on
the same scan line need to be searched.
The quality of the match is evaluated by
comparing the similarity of a small window of pixels with the candidate-match-ing pixel. Then the pixel in the right
image whose window best matches the
window of the pixel on the left image is
selected as the corresponding match.
The computational requirements
obviously increase as the image size
increases, because there are more pixels to process. In a larger image the
range of disparities measured in pixels
also increases, which requires a larger
search radius. For small-resolution images the CPU may be sufficient to calculate the disparities; with full HD resolution images, however, only the GPU
can provide enough processing power.
Figure 4 presents a block-matching
pipeline that produces a disparity image d(x,y) such that LeftImage(x,y)
corresponds to RightImage(x-d(x,y),y).
The pipeline first rectifies the images