Figure 2: The PALLAS group has investigated parallel algorithms for many computationally intensive applications including: video analysis, speech recognition,
contour detection, image classification, and medical image reconstruction.
imaging device, pixels in the images
themselves, and independent tasks in
the computation.
We have spent quite a bit of effort in
examining applications from many domains, and our experience shows that
parallelism is widespread in computationally intensive applications. We are
very optimistic that ubiquitous application of parallelism will continue to deliver application performance increases
that will drive future generations of
computing applications. (More information about our work in parallel applications can be found at http://parlab.
eecs.berkeley.edu/research/pallas.)
tines for image classification, we improved training times by 78x and classification times by 73x compared to a
sequential processor.
We then examined video analysis,
first building a variational optical flow
routine that computes the vector motion field for moving objects in video
sequences. Parallelism in this application comes from the pixels and frames
of the video sequence. Using a 240-way
parallel GPU, we improved the runtime
by 35x over a sequential processor. We
then used this routine to build a point-tracker that traces the motion of objects in video sequences. Our solution
keeps track of 1,000x more points than
competing state-of-the-art point trackers, while providing 66 percent more
accuracy than other approaches when
dealing with video scenes containing
quickly moving objects.
Speech recognition is another field
we have been examining. It’s a chal-
lenging application to parallelize, since
much of the computation involves in-
dependent, yet irregular, accesses and
updates to large graph structures that
model the possibilities of spoken lan-
guage. Our parallel speech recognition
engine runs 10x faster than a sequen-
tial processor, using a 240-way paral-
lel GPU. Importantly, we are about 3x
faster than real-time, which means
accurate speech recognition can be ap-
plied to many more applications than
was previously feasible using purely se-
quential approaches [ 10].
PATTERNS
In order to implement high-performance, scalable parallel programs, we
have found that we need a deeper understanding of software architecture, in
order to reveal independent tasks and
data in an application, and prevent implementation details from obstructing
parallel implementation. It is very difficult to unravel a tangled piece of software into a parallel implementation.
However, having a clear understanding
of the structure of a computation facilitates the exploration of a family of parallelization strategies, which is essential to producing a high-performance
parallel implementation. Additionally,
we want to avoid reinventing the wheel
as much as possible, which means we
need to be able to take advantage of the
wisdom gleaned from others’ experiences with parallel programming.
To help us reason about parallelism
in our applications, as well as the experiences of other parallel programmers,
we use software patterns, inspired by
two books: Design Patterns: Elements
of Reusable Object-Oriented Software
[ 6] and Patterns for Parallel Programming [ 7]. Each pattern is a generalizable solution to a commonly encountered problem. The patterns interlock
to form a pattern language: a common
vocabulary for discussing parallelism
at various scales in our applications.
The pattern language helps us explore
various approaches to our computation by pointing us to proven solutions
to the problems we encounter while
creating parallel software. Our pattern
language consists of several patterns
at each of the following levels (more in-