By Milind Kulkarni, Keshav Pingali, Bruce Walter, Ganesh Ramanarayanan, Kavita Bala, and L. Paul Chew
The problem of writing software for multicore processors
is greatly simplified if we could automatically parallelize
sequential programs. Although auto-parallelization has
been studied for many decades, it has succeeded only in a
few application areas such as dense matrix computations.
In particular, auto-parallelization of irregular programs,
which are organized around large, pointer-based data structures like graphs, has seemed intractable.
The Galois project is taking a fresh look at auto-parallelization. Rather than attempt to parallelize all programs no matter how obscurely they are written, we are
designing programming abstractions that permit programmers to highlight opportunities for exploiting parallelism
in sequential programs, and building a runtime system
that uses these hints to execute the program in parallel. In
this paper, we describe the design and implementation of
a system based on these ideas. Experimental results for two
real-world irregular applications, a Delaunay mesh refinement application and a graphics application that performs
agglomerative clustering, demonstrate that this approach is
1. iNTRoDuc TioN
A pessimist sees the difficulty in every opportunity; an
optimist sees the opportunity in every difficulty.
—sir winston Churchill
Irregular applications are organized around pointer-based
data structures such as graphs and trees, and are ubiquitous
in important application areas such as finite-elements, SAT
solvers, maxflow computations, and compilers. In principle,
it is possible to use a thread library (e.g., pthreads) or a combination of compiler directives and libraries (e.g., OpenMP)
to write parallel code for irregular applications, but it is well
known that writing explicitly parallel code can be very tricky
because of the complexities of memory consistency models,
synchronization, data races, etc. Tim Sweeney, who designed
the multithreaded Unreal 3 game engine, estimates that
writing multithreading code tripled software costs at Epic
Games (quoted in de Galas3).
From the earliest days of parallel computing, it has been
recognized that one way to circumvent the problems of
writing explicitly parallel code is auto-parallelization. 10 In
this approach, application programmers write sequential
programs, leaving it to the compiler or runtime system
to extract and exploit the latent parallelism in programs.
There is an enormous literature on algorithms and mechanisms for auto-parallelization, but like the characters in
Pirandello’s play Six Characters in Search of an Author, most
of them are in search of programs that they can parallelize. They can be divided into two categories: compile-time
techniques and runtime techniques. Compile-time techniques use static analyses to find independent computations in programs, and have succeeded in parallelizing
limited classes of irregular programs such as n-body methods. 1, 5, 20 Runtime techniques use optimistic parallelization: computations are parallelized speculatively, and
the runtime system detects conflicts between concurrent
computations and rolls them back as needed to preserve
the sequential semantics of the program. Optimistic parallelism is the basis of the popular Timewarp algorithm for
parallel event-driven simulation, 9 but efforts to build general-purpose systems based on optimistic parallelization,
such as thread-level speculation (TLS), 19, 22, 24 have had limited success. Because of these problems, interest in auto-parallelization has waned in recent years.
We are taking a fresh look at auto-parallelization, but
from a different perspective than prior work in this area.
Instead of trying to parallelize all application programs no
matter how obscurely written, the Galois project is focusing
on the following questions.
• Can we design sequential programming abstractions
that capture the most commonly occurring parallelism
patterns in programs?
• If so, what systems support is needed to auto-parallelize
programs that use these abstractions?
A useful analogy is relational database programming.
The SQL programmer views data as if they were organized
as a flat table (relations), and operates on the data using
high-level operations like joins and projections. Inside
the database system, relations are implemented in very
complex ways using B-trees, index structures, etc., and
the high-level operations are performed in parallel using
locks and transactions, but the relational abstractions
enable these complications to be hidden from the SQL
Can we carry out a similar program for irregular applications? Although we are far from having a complete solution,
the outlines of a solution for important patterns of parallelism are emerging from the fog. In this paper, we focus on
understanding and exploiting parallelism in iterative irregular applications. In Section 2, we describe parallelism patterns in two such applications: a Delaunay mesh refinement
The original version of this paper appeared in the
Proceedings of the 2007 ACM SIGPLAN Conference on Programming Language Design and Implementation.