Sequential-read workloads are similar, but the constants
depend strongly on the per byte processing required.
Traditional cluster architectures retain a place for CPU-bound workloads, but we do note that architectures such
as IBM’s BlueGene successfully apply large numbers of low-power, efficient processors to many supercomputing applications—but they augment their wimpy processors with
custom floating point units to do so.
Our definition of “total cost of ownership” ignores
several notable costs: In comparison to traditional architectures, FAWN should reduce power and cooling infrastructure but may increase network-related hardware and
power costs due to the need for more switches. Our current
hardware prototype improves work done per volume, thus
reducing costs associated with datacenter rack or floor
space. Finally, our analysis assumes that cluster software
developers can engineer away the human costs of management—an optimistic assumption for all architectures.
We similarly ignore issues such as ease of programming,
though we selected an x86-based wimpy platform for ease
6. RELATED WoRK
Several projects are using low-power processors for datacenter workloads to reduce energy consumption. 5, 8, 14, 19 These
systems leverage low-cost, low-power commodity components for datacenter systems, similarly arguing that this
approach can achieve the highest work per dollar and per
Joule. More recently, ultra-low power server systems have
become commercially available, with companies such as
SeaMicro, Marvell, Calxeda, and ZT Systems producing low-power datacenter computing systems based on Intel Atom
and ARM platforms.
FAWN builds upon these observations by demonstrating
the importance of re-architecting the software layers in obtaining the potential energy efficiency such hardware can provide.
The FAWN approach uses nodes that target the “sweet spot”
of per node energy efficiency, typically operating at about
half the frequency of the fastest available CPUs. Our experience in designing systems using this approach, often
coupled with fast flash memory, has shown that it has substantial potential to improve energy efficiency, but that
these improvements may come at the cost of re-architecting
software or algorithms to operate with less memory, slower
CPUs, or the quirks of flash memory: The FAWN-KV key-value system presented here is one such example. By successfully adapting the software to this efficient hardware,
our then four-year-old FAWN nodes delivered over an order
of magnitude more queries per Joule than conventional
Our ongoing experience with newer FAWN-style systems
shows that its energy efficiency benefits remain achievable,
but that further systems challenges—such as high kernel
I/O overhead—begin to come into play. In this light, we view
our experience with FAWN as a potential harbinger of the
systems challenges that are likely to arise for future many-core energy-efficient systems.
This work was supported in part by gifts from Network
Appliance, Google, and Intel Corporation, and by grant
CCF-0964474 from the National Science Foundation, as
well as graduate fellowships from NSF, IBM, and APC. We
extend our thanks to our OSDI and SOSP reviewers, Vyas
Sekar, Mehul Shah, and to Lorenzo Alvisi for shepherding
the work for SOSP. Iulian Moraru provided feedback and
1. andersen, D.g., franklin, J., kaminsky,
M., Phanishayee, a., tan, l.,
Vasudevan, V. fa Wn: a fast array of
wimpy nodes. In Proceedings of the
22nd ACM Symposium on Operating
Systems Principles (SOSP) (Big sky,
Mt, october 2009).
2. Barroso, l.a., hölzle, u. the
case for energy-proportional
computing. Computer 40, 12
3. Memory-only or flash configurations.
4. Bowman, W., cardwell, n., kozyrakis,
c., romer, c., Wang, h. evaluation
of existing architectures in IraM
systems. In Workshop on Mixing
Logic and DRAM, 24th International
Symposium on Computer
Architecture (Denver, co, June 1997).
5. caulfield, a.M., grupp, l.M., swanson,
s. gordon: using flash memory to
build fast, power-efficient clusters
for data-intensive applications.
In 14th International Conference on
Architectural Support for Programming
Languages and Operating Systems
(ASPLOS’09) (san Diego, ca,
6. chase, J.s., anderson, D., thakar,
P., Vahdat, a., Doyle, r. Managing
energy and server resources in
hosting centers. In Proceedings of the
18th ACM Symposium on Operating
Systems Principles (SOSP) (Banff, aB,
canada, october 2001).
7. Decandia, g., hastorun, D., Jampani, M.,
kakulapati, g., lakshman, a., Pilchin, a.,
sivasubramanian, s., Vosshall, P., Vogels,
W. Dynamo: amazon’s highly available
key-value store. In Proceedings of the
21st ACM Symposium on Operating
Systems Principles (SOSP) (stevenson,
Wa, oct. 2007).
8. hamilton, J. cooperative expendable
micro-slice servers (ceMs): low cost,
low power servers for Internet scale
9. Penryn Press release. http://www.
10. the Journaling flash file system.
David G. Andersen, Jason Franklin, Amar
Phanishayee, Lawrence Tan, and Vijay
Vasudevan, carnegie Mellon university
11. Johnson, B. facebook, personal
communication (november 2008).
12. katz, r.h. tech titans building boom.
IEEE Spectrum (february 2009).
13. lamport, l. the part-time parliament.
ACM Trans. Comput. Syst., 16, 2,
14. lim, k., ranganathan, P., chang, J.,
Patel, c., Mudge, t., reinhardt, s.
understanding and designing new
server architectures for emerging
In International Symposium on
Computer Architecture (ISCA)
(Beijing, china, June 2008).
15. nath, s., gibbons, P.B. online
maintenance of very large random
samples on flash storage. In
Proceedings of VLDB (auckland,
new Zealand, august 2008).
16. nath, s., kansal, a. flashDB:
Dynamic self-tuning database for
nanD flash. In Proceedings of ACM/
IEEE International Conference on
Information Processing in Sensor
Networks (cambridge, Ma, april 2007).
17. Polte, M., simsa, J., gibson, g.
enabling enterprise solid state disks
performance. In Proceedings of the
Workshop on Integrating Solid-State
Memory into the Storage Hierarchy
( Washington, Dc, March 2009).
18. stoica, I., Morris, r., karger, D.,
kaashoek, M.f., Balakrishnan, h.
chord: a scalable peer-to-peer lookup
service for Internet applications.
august. 2001. http://portal.acm.org/
19. szalay, a., Bell, g., terzis, a., White,
a., Vandenberg, J. low power amdahl
blades for data intensive computing,
20. tolia, n., Wang, Z., Mar wah, M.,
Bash, c., ranganathan, P., Zhu, X.
Delivering energy proportionality
with non energy-proportional
systems—optimizing the ensemble.
In Proceedings of HotPower (Palo
alto, ca, December 2008).
21. van renesse, r. schneider, f.B.
chain replication for supporting
high throughput and availability. In
Proceedings of the 6th USENIX OSDI
(san francisco, ca, December 2004).
Michael Kaminsky, lntel labs
© 2011 acM 0001-0782/11/07 $10.00