1. Arvind, K., Nikhil, R.S. Executing a
program on the MI T tagged-token
dataflow architecture. IEEE Trans.
Comput. 39, 3 (1990), 300–318.
2. Budiu, M., Artigas, P.V., Goldstein S.C.
Dataflow: A complement to superscalar.
In ISPASS '05 Proceedings of the
IEEE International Symposium on
Performance Analysis of Systems and
Software (March 20–22, 2005) IEEE
Computer Society, Washington, DC,
3. Burger, D., Keckler, S. W., McKinley, K.S.,
Dahlin, M., John, L.K., Lin, C., Moore, C.R.,
Burrill, J., McDonald, R.G., Yoder, W.,
Team, T. T. Scaling to the end of silicon
with edge architectures. Computer 37,
7 (July 2004), 44–55.
4. Clark, N., Kudlur, M., Park, H., Mahlke, S.,
Flautner, K. Application-specific
processing on a general-purpose
core via transparent instruction
set customization. In MICRO 37
Proceedings of the 37th Annual IEEE/
ACM International Symposium on
Microarchitecture (Portland, Oregon,
December 04–08, 2004), IEEE
Computer Society, Washington, DC,
5. Gebhart, M., Maher, B.A., Coons, K.E.,
Diamond, J., Gratz, P., Marino, M.,
Ranganathan, N., Robatmili, B.,
Smith, A., Burrill, J., Keckler, S. W.,
Burger, D., McKinley, K. S. An
evaluation of the trips computer
system. In ASPLOS XIV Proceedings
of the 14th International Conference
on Architectural Support for
Programming Languages and
Operating Systems ( Washington,
DC, USA, March 07–11, 2009), ACM,
New York, NY, USA, 1–12.
6. Govindaraju, V., Ho, C.-H., Nowatzki, T.,
Chhugani, J., Satish, N., Sankaralingam, K.,
Explicit-Dataflow (energy benefit)
s s Vector
Simple Core (energy benefit)
s s Higher
Predictable Unpredictable Repeating Non-
(a) Prior Specialization Techniques (b) Enabled Specialization Techniques
Figure 12. Program phase affinity by application characteristics. Memory ranges from regular and data-independent, to irregular and
data-dependent but with parallelism, to latency bound with no parallelism. Control can range from noncritical or not present, critical but
repeating, not repeating but predictable, to unpredictable and data-dependent.
concurrency, data-reuse, and coordination.
14 A dataflow
model of computation is especially suitable for exploiting
the first three principles for massive parallel computation,
whereas a Von Neumann model excels at the coordination
of control decisions and ordering. We further addressed
programmable specialization by proposing a Von Neumann/
dataflow architecture called stream-dataflow,
13 which specifies
memory access and communication as streams, enabling
effective specialization of data-reuse in caches and scratch-pad memories.
Future directions: The promise of dataflow specialization in the accelerator context is to enable freedom from
application-specific hardware development, leading to two
important future directions.
• An accelerator architecture: The high energy and area-efficiency of a Von Neumann/dataflow accelerator, coupled with a well-defined hardware/software interface,
enables the almost paradoxical concept of an accelerator architecture. We envision that a dataflow-special-ized ISA such as stream-dataflow, along with essential
hardware specialization principles, can serve as the
basis for future innovation for specialization architectures. Its high efficiency makes it an excellent baseline
comparison design for new accelerators, and the ease
of modifying its hardware/software interface can
enable integration of novel forms of computation and
memory specialization for challenging workload
• Compilation: How a given program leverages Von
Neumann and dataflow mechanisms can have tremen-
dous influence on attainable efficiency, and some meth-
odology is required to navigate this design space. The
fundamental compiler problem remains extracting
and expressing parallelism and locality. The execution
model and application domains make these problems
easier to address. Applications for which accelerators
are amenable are generally well-behaved (keeping to a
minimum or avoiding pointers, etc.). The execution model
and architecture provides interfaces to cleanly expose the
application’s parallelism and locality to the hardware.
This opens up exciting opportunities in compiler and
programming languages research to target accelerators.
This article observed a synergy between Von Neumann and
dataflow processors due to variance in program behaviors
at a fine grain and used this insight to build a practical processor, SEED. It enables potentially disruptive performance
and energy efficiency trade-offs for general-purpose processors, pushing the boundary of what is possible given only a
modestly complex core. This approach of specializing for
program behaviors using heterogeneous dataflow architectures could open a new design space, ultimately reducing the
importance of aggressive OOO designs and lead to greater
opportunity for radical architecture innovation.