abstractions.
Several research centers1, 2 are actively exploring the general problems
discussed here. The University of California, Berkeley, Parallel Computing
Lab and Stanford University’s Pervasive Parallelism Laboratory advocate
an application-driven approach to reinventing computing for parallelism.
conclusion
The vertical integration of a parallel-processing system, compiler, programming, and algorithms proposed
here through the XMT framework with
the ICE/PRAM abstraction as its front-end is notable for its relative simplicity. ICE is a newly defined feature that
has not appeared in prior research, including my own, and is more rudimentary than prior parallel computing
concepts. Rudimentary concepts are
the basis for the fundamental development of any field. ICE can be viewed as
an axiom that builds on mathematical
induction, one of the more rudimentary concepts in mathematics. The
suggestion here of using a simple abstraction as the guiding principle for
reinventing computing for parallelism
also appears to be new. Considerable
evidence suggests it can be done (see
the sidebar “Eye-of-a-Needle Aphorism”).
The following comparison with a
chapter on multithreading algorithms
in the 2009 textbook Introduction to Algorithms by Cormen et al. 9 helps clarify
some of the article’s contributions.
The 1990 first edition of Cormen et
al. 9 included a chapter on PRAM algorithms emphasizing the role of work-depth design and analysis; the 2009
chapter9 likewise emphasized work-depth analysis. However, to match current commercial hardware, the 2009
chapter turned to a variant of dynamic
multithreading (in lieu of work-depth
design) in which the main primitive
was similar to the XMT sspawn command (discussed here). One thread
was able to generate only one more
thread at a time; these two threads
would then generate one more thread
each, and so on, instead of freeing the
programmer to directly design for the
work-depth analysis that follows (per
the same 2009 chapter).
Cormen et al.’s9 dynamic multithreading should encourage hardware
the xmt
on-chip
general-purpose
computer
architecture
is aimed at the
classic goal
of reducing
single-task
completion time.
enhancement to allow simultaneously
starting many threads in the same
time required to start one thread.
A step ahead of available hardware,
XMT includes a spawn command that
spawns any number of threads upon
transition to parallel mode. Moreover, the ICE abstraction incorporates
work-depth early in the design workflow, similar to Cormen et al.’s 1990
first edition. 9
The O(log n) depth parallel merging
algorithm versus the O(log2 n) depth
one in Cormen et al. 9 demonstrated an
XMT advantage over current hardware,
as XMT allows a parallel algorithm for
the same problem that is both faster and simpler. The XMT hardware
scheduling brought the hardware performance model much closer to work-depth and allowed the XMT workflow
to streamline the design with the analysis from the start.
Several features of the serial paradigm made it a success, including a
simple abstraction at the heart of the
“contract” between programmers
and builders, the software spiral, ease
of programming, ease of teaching,
and backward compatibility on serial
code and application programming.
The only feature that XMT, as in other
multi-core approaches, does not provide is speedups for serial code. The
ICE/PRAM/XMT workflow and architecture provide a viable option for
the many-core era. My XMT solution
should challenge and inspire others to
come up with competing abstraction
proposals or alternative architectures
for ICE/PRAM. Consensus around an
abstraction will move CS closer to convergence toward a many-core platform
and putting the software spiral back
on track.
The XMT workflow also gives programmers a productivity advantage.
For example, I have traced several errors in student-developed XMTC programs to shortcuts the students took
around the ICE algorithms. Overall,
improved understanding of programmer productivity, a traditionally difficult issue in parallel computing,
must be a top priority for architecture research. To the extent possible,
evaluation of productivity should be
on par with performance and power.
For starters, productivity benchmarks
must be developed.