and to a 27 W Intel Atom-based front-end node using two
16-port Netgear GS116 GigE Ethernet switches.
evaluation workload: We show query performance for
256 byte and 1KB values. We select these sizes as proxies
for small text posts, user reviews or status messages, image
thumbnails, and so on. They represent a quite challenging
regime for conventional disk-bound systems and stress the
limited memory and CPU of our wimpy nodes.
4. 1. Individual node performance
We benchmark the I/O capability of the FAWN nodes using
iozone and Flexible I/O tester. The flash is formatted with
the ext2 filesystem. These tests read and write 1KB entries,
the lowest record size available in iozone. The filesystem I/O
performance using a 3.5GB file is shown in Table 1.
4. 1. 1. FAWN-DS single node local benchmarks
Lookup speed: This test shows the query throughput
achieved by a local client issuing queries for randomly
distributed, existing keys on a single node. We report the
average of three runs (the standard deviations were below
5%). Table 2 shows FAWN-DS 1KB and 256 byte random
read queries/s as a function of the DS size. If the datastore
fits in the buffer cache, the node locally retrieves 50,000–
85,000 queries/s. As the datastore exceeds the 256MB of
RAM available on the nodes, a larger fraction of requests
go to flash.
FAWN-DS imposes modest overhead from hash lookups, data copies, and key comparisons; and it must read
slightly more data than the iozone tests (each stored entry
has a header). The query throughput, however, remains
high: Tests reading a 3.5GB datastore using 1KB values
achieved 1,150 queries/s compared to 1,424 queries/s
from the filesystem. Using 256 byte entries achieved 1,298
queries/s from a 3. 5 GB datastore. By comparison, the raw
filesystem achieved 1,454 random 256 byte reads/s using
Flexible I/O.
Bulk store speed: The log structure of FAWN-DS ensures
that data insertion is entirely sequential. Inserting 2 million
Table 1. Baseline CompactFlash statistics for 1KB entries.
qPS = queries/second.
Seq. Read
28.5Mb/s
Rand Read
1424 QPs
Seq. Write
24Mb/s
Rand. Write
110 QPs
Table 2. Local random read speed of FAWn-DS.
DS Size
10Kb
125Mb
250Mb
500Mb
1Gb
2Gb
3.5Gb
1KB Rand Read
(in queries/s)
72352
51968
6824
2016
1595
1446
1150
256 bytes Rand Read
(in queries/s)
85012
65412
5902
2449
1964
1613
1298
106 CoMMunICATIonS oF ThE ACM | july 2011 | Vol. 54 | no. 7
entries of 1KB each (2GB total) into a single FAWN-DS log
proceeds at 23.2MB/s (nearly 24,000 entries/s), which is 96%
of the raw speed that the flash can be written through the
filesystem.
Put speed: Each FAWN-KV node has R V FAWN-DS files:
Each virtual ID adds one primary data range, plus an additional R − 1 replicated ranges. A node receiving puts for different ranges will concurrently append to a small number of
files (“semi-random writes”). Good semi-random write performance is central to FAWN-DS’s per range data layout that
enables single-pass maintenance operations. Our recent
work confirms that modern flash devices can provide good
semi-random write performance. 1
4. 1. 2. Comparison with BerkeleyDB
To understand the benefit of FAWN-DS’s log structure, we
compare with a general purpose disk-based database that
is not optimized for flash. BerkeleyDB provides a simple
put/get interface, can be used without heavy-weight transactions or rollback, and performs well vs. other memory
or disk-based databases. We configured BerkeleyDB using
both its default settings and using the reference guide suggestions for flash-based operation. 3 The best performance
we achieved required 6hours to insert 7 million, 200 byte
entries to create a 1.5GB B-Tree database. This corresponds
to an insert rate of 0.07MB/s.
The problem was, of course, small writes: When the
BDB store was larger than the available RAM on the nodes
(<256MB), BDB had to flush pages to disk, causing many
writes that were much smaller than the size of an erase
block.
That comparing FAWN-DS and BDB seems unfair is exactly the point: Even a well-understood, high-performance
database will perform poorly when its write pattern has
not been specifically optimized to flash characteristics.
We evaluated BDB on top of NILFS2, a log-structured
Linux filesystem for block devices, to understand whether
log-structured writing could turn the random writes into
sequential writes. Unfortunately, this combination was
not suitable because of the amount of metadata created for
small writes for use in filesystem checkpointing and rollback, features not needed for FAWN-KV—writing 200MB
worth of 256 bytes key-value pairs generated 3.5GB of metadata. Other existing Linux log-structured flash filesystems,
such as JFFS2, are designed to work on raw flash, but modern SSDs, compact flash, and SD cards all include a Flash
Translation Layer that hides the raw flash chips. While
future improvements to filesystems can speed up naive DB
performance on flash, the pure log structure of FAWN-DS
remains necessary even if we could use a more conventional back end: It provides the basis for replication and
consistency across an array of nodes.
4. 1. 3. Read-intensive vs. write-intensive workloads
Most read-intensive workloads have some writes. For example, Facebook’s memcached workloads have a 1: 6 ratio of
application-level puts to gets. 11 We therefore measured the
aggregate query rate as the fraction of puts ranging from 0