Technical Perspective
abstraction for Parallelism
By Katherine Yelick
LooKing For soME new insight into an old
problem? The following research paper
by Milind Kulkarni et al. addresses the
familiar problem of writing parallel applications and uses a fresh approach
based on data abstraction to allow some
challenging programs to be parallelized.
Going back more than 30 years to the
foundational work by Owicki and Gries
on the semantics of parallel programs,
and through decades of work on automatic parallelization of Fortran and C
programs, the focus in the research and
commercial communities has been on
reasoning at the concrete level of read
and write operations: if two iterations
of a loop access the same variable, and
at least one of them performs a write,
then the iterations cannot execute in
parallel. When combined with a static
compile-time approach, the necessarily
conservative analysis techniques mean
that many loops cannot be parallelized,
especially for programs with pointer-based data structures.
With a computer industry betting
on multicore, the need for solutions
to the parallel software problem has
reached a new level of criticality. There
has been a resurgence of parallelism research, much of it focused on dynamic
discovery of parallelism using speculative techniques. The authors build on
the idea of dynamic parallelism discovery by combining loop constructs
with conflict analysis and a rollback
mechanism to support speculative parallelism. The Galois system described
in the paper has both ordered and unordered loops, but presents users with
a serial semantics in both cases. Galois
uses the type system to further control
the behavior of such loops. Objects that
use a traditional model of speculative
parallelism are instances of so-called
“catch-and-keep” classes, because the
runtime system holds a lock through-
out an iteration to ensure that the iterations appear to execute serially.
But an interesting class of algorithms allow for correct behavior even
in the presence of conflicting accesses.
For example, branch and bound search
can proceed in any order but must update the value of a variable representing the current bound. The authors add
to speculative execution: programmers
are allowed to specify that two method
invocations on an object commute,
meaning they can safely be reordered.
In the branch and bound example, if
a class is defined for the bound, with
only operations to read the bound and
update it monotonically by performing a max operation with a newly discovered bound, then the update operations commute with one another.
Moreover, methods that have no effect
on the abstract value, but “benevolent
side effects” on the underlying state
will commute at the abstract level despite having conflicting accesses.
Classes that have commutativity
specifications are called “
catch-and-release” classes, because the implementation holds a lock only during a
method invocation to ensure atomicity
of the operation, but not throughout
the entire iteration. Serialization of the
iterations is ensured instead through
commutativity of the methods and,
if abstract conflicts are discovered,
though rollbacks.
The commutativity specification
combined with unordered loops gives
a programming model that is somewhere between explicit parallelism and
sequential programming, as the programmer may give many opportunities
for reordering operations, but the program still has a serial semantics. The
one final concept needed to parallelize
some complex irregular applications is
the idea that methods with side effects
on catch-and-release classes must have
inverse methods to “undo” the effects
in case an abstract-level conflict is
discovered. This allows iterations to
execute in parallel, possibly with conflicting reads and writes to variables,
as long as the methods involved are
known to commute. If an abstract conflict is discovered between two method
invocations that are not commutative,
the runtime layer will roll back one of
the iterations using the inverse of the
operations that had been executed.
The authors successfully apply
these ideas to two complex parallelization problems—an unstructured mesh
refinement algorithm and a clustering
algorithm used in data mining. Both
problems involve irregular control flow
and pointer-based data structures,
making them intractable for static parallelism approaches or even speculative parallelism based on concrete conflict detection.
This paper presents the general ideas
using these two compelling examples,
and the concepts are both original and
thought provoking. Abstraction mechanisms like object orientation are often considered deterrents to performance and parallelization, but in this
approach data abstraction provides the
specification point for commutativity
and therefore a key to parallelization.
The authors raise a number of interesting semantic issues in the examples
and overall approach, and for those interested in a new perspective on parallel programming, I highly recommend
this paper.
Katherine Yelick is a professor of Electrical Engineering
and Computer sciences at the University of California,
Berkeley, and the director of the national Energy
Research scientific Computing Center (nERsC), a
national supercomputing facility at Lawrence Berkeley
national Laboratory that serves the Department of
Energy.