pletion without synchronization (no
busy-waits) and synchronizing access
to shared data with prefix-sum (
fetch-and-add type) instructions. These features result in a flexible programming
style that accommodates the ICE abstraction and encourages program development for a range of applications.
The reinvention of computing for
parallelism also requires pulling together a number of technical communities. My 2009 paper26 sought to build
a bridge to other architectures by casting the abstraction-centric vision of
this article as a possible module in
them, identifying a limited number of
capabilities the module provides and
suggesting a preferred embodiment
of these capabilities using concrete
“hardware hooks.” If it is possible
to augment a computer architecture
through them (with hardware hooks
or other means), the ICE abstraction
and the programmer’s workflow, in
line with this article, can be supported. The only significant obstacle in today’s multi-core architectures is their
large cache-coherent local caches.
Their limited scalability with respect
to power gives vendors more reasons
beyond an easier programming model
to let go of this obstacle.
PRAM parallel algorithmic ap-
proach. The parallel random-access
machine/model (PRAM) virtual model
of computation is a generalization of
the random-access machine (RAM)
model. 9 RAM, the basic serial model
behind standard programming lan-
guages, assumes any memory access
or any operation (logic or arithmetic)
takes unit-time (serial abstraction).
The formal PRAM model assumes a
certain number, say, p of processors,
each able to concurrently access any
location of a shared memory in the
same time as a single access. PRAM
has several submodels that differ by
assumed outcome of concurrent ac-
cess to the same memory location for
either read or write purposes. Here, I
note only one of them—the Arbitrary
Concurrent-Read Concurrent-Write
(CRCW) PRAM—which allows con-
current accesses to the same memory
location for reads or writes; reads
complete before writes, and an arbi-
trary write (to the same location, un-
known in advance) succeeds. PRAM
algorithms are essentially prescribed
as a sequence of rounds and, for each
round, up to p processors execute con-
currently. The performance objective
is to minimize the number of rounds.
The PRAM parallel-algorithmic ap-
proach is well-known and has never
been seriously challenged by any
other parallel-algorithmic approach
in terms of ease of thinking or wealth
of knowledgebase. However, PRAM
is also a strict formal model. A PRAM
algorithm must therefore prescribe
for each and every one of its p proces-
sors the instruction the processor ex-
ecutes at each time unit in a detailed
computer-program-like fashion that
can be quite demanding. The PRAM-
algorithms theory mitigates this in-
struction-allocation scheme through
the work-depth (WD) methodology.
figure 3. serial and parallel execution modes.
Serial
mode
Parallel
mode
Serial
mode
Parallel
mode
Serial
mode
…
Spawn
…
Join
Spawn
…
Join
…
january 2011 | vol. 54 | no. 1 | communications of the acm 79