embed our frameworks in these productivity languages so that using them
looks like writing other programs in
those languages, and so that we can
interoperate with other code written in
those languages, such as libraries for
visualization, data I/O, or communication. The programmer delineates the
portions of code that are intended to
be processed by the framework by using some explicit mechanism from the
host language, such as function decorators or inheritance. Hence, we call
our approach selective and embedded:
selective because we specialize only
an explicitly delineated subset of the
program, in an explicitly delineated
subset of the host language, and embedded because programs using our
frameworks are embedded in the host
At runtime, we specialize the composition of elements drawn from the
library supported by the framework
to create parallel implementations of
the original computation. This is done
to improve programmer productivity,
since frameworks may need to specialize differently based on different properties of the input data to a particular
computation, such as data structure,
size, etc. Specialization refers to the
property that our programs are constructed as a specialized composition
of elements drawn from the library
provided by the framework. Consequently, our approach provides just-in-time specialization.
There are several SEJITS projects
underway, all currently using Python
as the host language: Copperhead,
for data parallelism; PySKI, for sparse
matrix computations; and a specializer for computations on meshes and
grids. We believe that SEJITS will allow us to build many specializers, thus
enabling the creation of many pattern-oriented frameworks, and ultimately
improving the productivity of parallel
THE FUTURE OF PARALLEL
Parallel programming is becoming
ubiquitous, and computationally intensive applications must now be written to take advantage of parallelism if
they expect to see future performance
increases. This means that the paral-
lel programming problem has become
very important to computationally intensive applications.
Fortunately, the kind of parallelism we can access with today’s highly
integrated parallel processors is available in many classes of computationally intensive application. Our experience shows that careful attention to
software architecture and the details
of how computations are mapped to
parallel platforms can result in high
performance parallel programs. Consequently, if we can make parallel
programmers more productive, parallelism will succeed in delivering increased performance to computationally intensive applications.
Toward this goal, we are constructing a pattern language for parallelism,
which helps us reason about the parallelism in our applications, communicate about the challenges we face in
parallelizing our computations, and
connect to the lessons which other
programmers have learned when facing parallel programming challenges
of their own.
We are building pattern-oriented
frameworks that raise the level of abstraction of parallel programming as
well as encourage good software architecture. We have identified a methodology for building these frameworks
and easing their adoption, and several
projects are under way to prove the util-
“We are building
raise the level
well as encourage
ity of our approach.
Ultimately, we are very optimistic
about the prospects for parallelism
and are excited to be a part of the ongoing shift toward parallel computing.
Bryan Catanzaro is a Ph D candidate in the electrical
engineering and computer science department at the
University of California-Berkeley. His research interests
center on programming models for many-core computers,
with an applications-driven emphasis. He has an MS in
electrical engineering from Brigham Young University.
Kurt Keutzer is a professor of electrical engineering and
computer science at the University of California-Berkeley,
and the principal investigator in the Parallel Computing
Laboratory. His research interests include patterns and
frameworks for efficient parallel programming. Keutzer
has a PhD in computer science from Indiana University. He
is a fellow of the IEEE.
The authors ackno wledge support from Microsoft (Award
#024263) and Intel (Award #024894) and matching
funding by U. C. Discovery (Award #DIG07-10227).
Additional support comes from Par Lab affiliates National
Instruments, NEC, Nokia, NVIDIA, Samsung, and Sun
1. Asanović, K., Bodik, R., Catanzaro, B., Gebis, J.,
Husbands, P., Keutzer, K., Patterson, D., Plishker,
W., Shalf, J., Williams, S., and Yelick, K. 2006. The
landscape of parallel computing research: A view from
Berkeley. Technical Report UCB/EECS-2006-183,
EECS Department, University of California, Berkeley.
2. Catanzaro, B., Kamil, S., Lee, Y., Asanović, K., Demmel,
J., Keutzer, K., Shalf, J., Yelick, K., and Fox, A. 2010.
SEJI TS: Getting productivity and performance with
selective embedded JI T specialization. Technical
Report UCB/EECS-2010-23, EECS Department,
University of California-Berkeley.
3. Catanzaro, B., Su, B.-Y., Sundaram, N., Lee, Y., Murphy,
M., and Keutzer, K. 2009. Efficient, high-quality
image contour detection. In Proceedings ofthe
IEEE International Conference on Computer Vision.
4. Catanzaro, B., Sundaram, N., and Keutzer, K.
2008. Fast support vector machine training and
classification on graphics processors. In Proceedings
of the 25th International Conference on Machine
5. Dubey, P. 2005. Recognition, mining and synthesis
moves computers to the era of tera. 2005. Technology
@ Intel Magazine. February, 1–10.
6. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. M.
1995. Design Patterns: Elementsof Reusable Object-Oriented Software. Addison- Wesley Professional.
7. Mattson, T., Sanders, B., and Massingill, B. 2004.
Patterns for Parallel Programming. Addison- Wesley
8. Murphy, M., Keutzer, K., Vasanawala, S., and Lustig, M.
2010. Clinically feasible reconstruction time for L1-
SPIRi T parallel imaging and compressed sensing MRI.
In ISMRM ‘10: Proceedings of the International Society
for Magnetic Resonance Medicine.
9. Sundaram, N., Brox, T., and Keutzer, K. 2010.
Dense point trajectories by GPU-accelerated large
displacement optical flo w. In ECCV ‘10: Proceedings of
the European Conference on Computer Vision.
10. You, K., Chong, J., Yi, Y., Gonina, E., Hughes, C., Chen,
Y., Sung, W., and Keutzer, K. 2009. Parallel scalability
in speech recognition: Inference engines in large
vocabulary continuous speech recognition. IEEE
Signal Processing Magazine, 26, 124–135.