To view the accompanying paper,
“YOU MAY FIRE when you are ready,
Gridley,” is the famous command
from Commodore Dewey in the Battle
of Manila Bay, 1898. He may not have
realized it, but he was articulating the
basic principle of dataflow computing,
where an instruction can be executed
as soon as its inputs are available.
Dataflow has long fascinated computer
architects as perhaps a more “natural”
way for computation circuits to best exploit parallelism for performance.
A visiting alien may be forgiven for
experiencing whiplash when shown
how we treat parallelism in programs.
Mathematical algorithms have abun-
dant parallelism; the only limit is
power) to rediscover parallelism.
The 1970s through early 1990s saw
several attempts to avoid these “
unnecessary” sequentializations (green
circles in Figure 2). Dataflow languages (mostly purely functional) and machine code (dataflow graphs) retained
parallelism from the math. Instead of
a program counter, each instruction
directly named its successor(s) receiving its outputs. Dataflow CPUs directly
executed this graph machine code.
Nowadays this computation model
goes by the acronym EDGE, for explicit
dataflow graph execution.
So, why aren’t we all using EDGE
machines today? A short answer is
that they never quite mastered spatial
or temporal locality and were subpar on inherently sequential code regions. In contrast, modern von Neumann CPUs excel at this, managing
efficient flow of data between circuits
that are fast-and-expensive (registers,
wires), medium (caches), and slow-and-cheap (DRAMs).
The following paper by Tony
Nowatzki, Vinay Gangadhar, and
Karthikeyan Sankaralingam describes
an innovative approach to exploit both
models. From the CDFG, their compiler generates both traditional sequential machine code and a data graph,
each being executed on appropriate
circuits (blue squares in Figure 2), with
efficient hand-off mechanisms. The
authors describe extensive studies to
validate the viability of this approach
for existing codes.
EDGE computing is undergoing a
renaissance, with many researchers
pursuing related ideas. There are in-
dications that big industry players are
also contemplating this direction.a
a Morgan, T.P. Intel’s Exascale dataflow engine
drops x86 and von Neumann. The NEXT Plat-
form, Aug 30, 2018.
Rishiyur S. Nikhil is Chief Technical Officer at
Bluespec, Inc., a semiconductor tool design company in
Framingham, MA, USA.
Copyright held by author.
data dependency (an operator can be
evaluated when its inputs are available). We code it in a mainstream
programming language (C/C++, Python, among others), which has completely sequential semantics (zero
parallelism) to make sense of reads
and writes to memory. As illustrated
in Figure 1, compilers sweat mightily
to rediscover some of the lost parallelism in their internal CDFGs (
control and data flow graphs), and then
produce machine code that, again, is
completely sequential. When we execute this on a modern von Neumann
CPU, wide-issue, out-of-order circuits
once again sweat mightily (burning
Back to the Edge
By Rishiyur S. Nikhil
Figure 1. Parallelism during coding, compilation, and execution.
Figure 2. Alternative strategies for exploiting parallelism.
algorithm Dataflow language
(Val, Id, Sisal, pH, …)