figure 2. Right column is a workflow from an ice abstraction of an algorithm to
implementation; left column may never terminate.
ICE
ICE
Parallel algorithm
Parallel algorithm
Parallel program
XMT Program
Insufficient inter−thread
bandwidth?
Rethink algorithm:
Take better advantage
of cache
Tune
no
XMT hardware
Hardware
a robust many-core platform coupled
with a new many-core software spiral
to serve the world of computing for
years to come. A software spiral is basically an infrastructure for the economy. Since advancing infrastructures
generally depends on government
funding, designating software-spiral
rebirth a killer app also motivates
funding agencies and major vendors
to support the work. The impact on
manufacturing productivity could further motivate them.
Programmer Workflow
ICE requires the lowest level of cognition from the programmer relative
to all current parallel programming
models. Other approaches require
additional steps (such as decomposition10). In CS theory, the speedup provided by parallelism is measured as
work divided by depth; reducing the
advantage of ICE/PRAM to practice is
a different matter.
The reduction to practice I have led
relies on the programmer’s workflow,
as outlined in the right side of Figure
2. Later, I briefly cover the parallel-algorithms stage. The step-by-step
PRAM explication, or “data-parallel”
instructions, represents a traditional
tightly synchronous outlook on parallelism. Unfortunately, tight step-by-step synchrony is not a good match
with technology, including its power
constraints.
To appreciate the difficulty of im-
plementing step-by-step synchrony
in hardware, consider two examples:
Memories based on long tightly syn-
chronous pipelines of the type seen in
Cray vector machines have long been
out of favor among architects of high-
performance computing; and process-
ing memory requests takes from one
to 400 clock cycles. Hardware must be
made as flexible as possible to advance
without unnecessary waiting for con-
current memory requests.