using and applying cryptocurrencies:
privacy, security, and interfacing with
the real world. These will be fertile areas of research and development in
the years to come.
Arvind Narayanan is an assistant professor of computer
science at Princeton, where he leads a research team
investigating the security, anonymity, and stability
of cryptocurrencies as well as novel applications
of blockchains. He also leads the Princeton Web
Transparency and Accountability Project, to uncover how
companies collect and use our personal information.
Andrew Miller is an assistant professor in Electrical
and Computer Engineering at the University of Illinois
at Urbana-Champaign. He is an associate director of the
Initiative for Cryptocurrencies and Contracts (IC3) at
Cornell and an advisor to the Zcash project.
Hardware For Deep
Learning
By Song Han
Deep neural networks
(DNNs) have evolved to
a state-of-the-art technique for machine-learning tasks
ranging from computer vision to
speech recognition to natural language processing. Deep-learning
algorithms, however, are both computationally and memory intensive,
making them power-hungry to deploy
on embedded systems. Running deep-learning algorithms in real time at
subwatt power consumption would be
ideal in embedded devices, but general-purpose hardware is not providing
satisfying energy efficiency to deploy
such a DNN. The three papers presented here suggest ways to solve this
problem with specialized hardware.
The Compressed Model
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A.,
Horowitz, M.A., Dally, W.J.
EIE: Efficient inference engine on compressed
deep neural network. In Proceedings of
the International Symposium on Computer
Architecture, 2016.
https://arxiv.org/pdf/1602.01528v2.pdf.
This work is a combination of algorithm optimization and hardware specialization. EIE (efficient inference
engine) starts with a deep-learning-model compression algorithm that first
prunes neural networks by 9–13 times
without hurting accuracy, which leads
to both computation saving and memory saving; next, using pruning plus
weight sharing and Huffman coding,
EIE further compresses the network
35–49 times, again without hurting ac-
curacy. On top of the compression al-
gorithm, EIE is a hardware accelerator
that works directly on the compressed
model and solves the problem of ir-
regular computation patterns (sparsity
and indirection) brought about by the
compression algorithm. EIE efficiently
parallelizes the compressed model
onto multiple processing elements and
proposes an efficient way of partition-
ing and load balancing both the storage
and the computation. This achieves a
speedup of 189/13 times and an energy
efficiency improvement of 24,000/3,400
times over a modern CPU/GPU.
Optimized Dataflow
Chen, Y.-H., Emer, J., Sze, V.
Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural
networks. In Proceedings of the International
Symposium on Computer Architecture,
2016. https://www.researchgate.net/
publication/301891800_Eyeriss_A_Spatial_
Architecture_for_Energy-Efficient_Dataflow_
for_Convolutional_Neural_Networks.
Deep-learning algorithms are memory
intensive, and accessing memory consumes energy more than two orders of
magnitude more than ALU (arithmetic
logic unit) operations. Thus, it’s critical to develop dataflow that can reduce
memory reference. Eyeriss presents a
novel dataflow called RS (
row-station-ary) that minimizes data-movement
energy consumption on a spatial architecture. This is realized by exploiting local data reuse of filter weights and
feature map pixels (that is, activations) in
the high-dimensional convolutions, and
by minimizing data movement of partial
sum accumulations. Unlike dataflows
used in existing designs, which reduce
only certain types of data movement, the
proposed RS dataflow can adapt to different CNN (convolutional neural network) shape configurations and reduce
all types of data movement through
maximum use of PE (processing engine)
local storage, direct inter-PE communication, and spatial parallelism.
Small-Footprint Accelerator
Chen, T., Wang, J., Du, Z., Wu, C.,
Sun, N., Chen, Y., Temam, O.
DianNao: A small-footprint high-throughput
accelerator for ubiquitous machine-learning. In
Proceedings of the International Conference
on Architectural Support for Programming
Languages and Operating Systems, 2014.
http://pages.saclay.inria.fr/olivier.temam/files/
eval/CDSWWCT14.pdf.
Recent state-of-the-art CNNs and
DNNs are characterized by their large
sizes. With layers of thousands of neurons and millions of synapses, they
place a special emphasis on interactions with memory. DianNao is an
accelerator for large-scale CNNs and
DNNs, with a special emphasis on
the impact of memory on accelerator
design, performance, and energy. It
takes advantage of dedicated storage,
which is key for achieving good performance and power. By carefully exploiting the locality properties of neural
network models, and by introducing
storage structures custom designed
to take advantage of these properties, DianNao shows it is possible to
design a machine-learning accelerator capable of high performance in a
very small footprint. It is possible to
achieve a speedup of 117.87 times and
an energy reduction of 21.08 times
over a 128-bit 2GHz SIMD (single instruction, multiple data) core with a
normal cache hierarchy.
Looking Forward
Specialized hardware will be a key
solution to make deep-learning algorithms faster and more energy efficient. Reducing memory footprint
is the most critical issue. The papers
presented here demonstrate three
ways to solve this problem: optimize
both algorithm and hardware and
accelerate the compressed model;
use an optimized dataflow to schedule the data movements; and design
dedicated memory buffers for the
weights, input activations, and output activations. We can look forward
to seeing more artificial intelligence
applications benefit from such hardware optimizations, putting AI everywhere, in every device in our lives.
Song Han is a Ph.D. student at Stanford University,
Stanford, CA. He proposed deep compression that can
compress state-of-the art CNNs by 10–49 times and
designed EIE (efficient inference engine), a hardware
architecture that does inference directly on the
compressed sparse model.
Copyright held by owner(s)/authors.
Publication rights licensed to ACM. $15.00