FAWN: A Fast Array
of Wimpy Nodes
This paper presents a fast array of wimpy nodes—FAWN—
an approach for achieving low-power data-intensive data-center computing. FAWN couples low-power processors
to small amounts of local flash storage, balancing computation and I/O capabilities. FAWN optimizes for per node
energy efficiency to enable efficient, massively parallel
access to data.
The key contributions of this paper are the principles of
the FAWN approach and the design and implementation of
FAWN-KV—a consistent, replicated, highly available, and
high-performance key-value storage system built on a FAWN
prototype. Our design centers around purely log-structured
datastores that provide the basis for high performance on
flash storage, as well as for replication and consistency
obtained using chain replication on a consistent hashing
ring. Our evaluation demonstrates that FAWN clusters can
handle roughly 350 key-value queries per Joule of energy—
two orders of magnitude more than a disk-based system.
1. In TRoDuCTIon
Large-scale data-intensive applications, such as high-performance key-value storage systems, are growing in both
size and importance; they now are critical parts of major
Internet services such as Amazon (Dynamo7), Linkedln
(Voldemort), and Facebook (memcached).
The workloads these systems support share several characteristics: They are I/O, not computation, intensive, requiring random access over large datasets; they are massively
parallel, with thousands of concurrent, mostly independent
operations; their high load requires large clusters to support them; and the size of objects stored is typically small,
for example, 1KB values for thumbnail images, hundreds of
bytes for wall posts, and twitter messages.
The clusters that serve these workloads must provide both
high performance and low-cost operation. Unfortunately,
small-object random-access workloads are particularly ill
served by conventional disk-based or memory-based clusters. The poor seek performance of disks makes disk-based
systems inefficient in terms of both system performance
and performance per Watt. High-performance DRAM-based
clusters, storing terabytes or petabytes of data, are expensive
and power-hungry: Two high-speed DRAM DIMMs can consume as much energy as a 1TB disk.
The power draw of these clusters is becoming an increas-
ing fraction of their cost—up to 50% of the 3 year total cost
of owning a computer. The density of the datacenters that
house them is in turn limited by their ability to supply and
cool 10–20k W of power per rack and up to 10–20MW per
datacenter. 12 Future datacenters may require as much as
200 MW, 12 and datacenters are being constructed today with
dedicated electrical substations to feed them.
The original version of this paper was published in
Proceedings of the 22nd ACM Symposium of Operating
Systems Principles, October 2009.