3. 2. understanding flash storage
Flash provides a non-volatile memory store with several
significant benefits over typical magnetic hard disks for
random-access, read-intensive workloads—but it also
introduces several challenges. Three characteristics of flash
underlie the design of the FAWN-KV system described in
this section:
3. 3. The FAWn datastore
FAWN-DS is a log-structured key-value store. Each store contains values for the key range associated with one virtual ID. It
acts to clients like a disk-based hash table that supports Store,
Lookup, and Delete.
FAWN-DS is designed to perform well on flash storage
and to operate within the constrained DRAM available on
wimpy nodes: All writes to the datastore are sequential, and
reads require a single random access. To provide this property, FAWN-DS maintains an in-DRAM hash table (Hash
Index) that maps keys to an offset in the append-only Data
Log on flash (Figure 2a). This log-structured design is similar to several append-only filesystems such as the Google
File System (GFS) and Venti, which avoid random seeks on
magnetic disks for writes.
mapping a key to a value: FAWN-DS uses an in-memory
(DRAM) Hash Index to map 160 bit keys to a value stored in
the Data Log. It stores only a fragment of the actual key in
memory to find a location in the log; it then reads the full
key (and the value) from the log and verifies that the key it
read was, in fact, the correct key. This design trades a small
and configurable chance of requiring two reads from flash
(we set it to roughly 1 in 32,768 accesses) for drastically
reduced memory requirements (only 6 bytes of DRAM per
key-value pair).
FAWN-DS’s Lookup procedure extracts two fields from
the 160 bit key: The i low order bits of the key (the index bits)
and the next 15 low order bits (the key fragment). FAWN-DS
uses the index bits to select a bucket from the Hash Index,
which contains 2i hash buckets. Each bucket is 6 bytes: a 15
bit key fragment, a valid bit, and a 4 byte pointer to the location in the Data Log where the full entry is stored.
Lookup proceeds, then, by locating a bucket using the
index bits and comparing the key against the key fragment.
If the fragments do not match, FAWN-DS uses hash chaining to continue searching the hash table. Once it finds a
matching key fragment, FAWN-DS reads the record off
of the flash. If the stored full key in the on-flash record
matches the desired lookup key, the operation is complete.
Otherwise, FAWN-DS resumes its hash chaining search of
the in-memory hash table and searches additional records.
With the 15-bit key fragment, only 1 in 32,768 retrievals
from the flash will be incorrect and require fetching an
additional record.
The constants involved ( 15 bits of key fragment, 4
bytes of log pointer) target the prototype FAWN nodes
described in Section 4. A typical object is between
256 bytes and 1KB, and the nodes have 256MB of DRAM
and approximately 4GB of flash storage. Because each
physical node is responsible for V key ranges (each with
its own datastore file), it can address 4GB V bytes of
data. Expanding the in-memory storage to 7 bytes per
entry would permit FAWN-DS to address 1TB of data per
key range. While some additional optimizations are possible, such as rounding the size of objects stored in flash
or reducing the number of bits used for the key fragment
(and thus incurring, e.g., a 1-in-1000 chance of having to
do two reads from flash), the current design works well
for the key-value workloads we study.
1. Fast random reads: ( 1ms) up to 175 times faster
than random reads on magnetic disk. 17
2. efficient i/O: Many flash devices consume less than
1 W even under heavy load, whereas mechanical disks
can consume over 10 W at load.
3. slow random writes: Small writes on flash are expensive. Updating a single page requires first erasing an
entire erase block (128–256KB) of pages and then writing the modified block in its entirety. Updating a single
byte of data is therefore as expensive as writing an
entire block of pages. 16
Modern devices improve random write performance
using write buffering and preemptive block erasure. These
techniques improve performance for short bursts of writes,
but sustained random writes still underperform. 17
These performance problems motivate log-structured
techniques for flash filesystems and data structures. 10, 15, 16
These same considerations inform the design of FAWN’s
node storage management system, described next.
Figure 2. (a) FAWn-DS appends writes to the end of the Data Log. (b) Split requires a sequential scan of the data region, transferring
out-of-range entries to the new store. (c) After scan completes, the datastore list is atomically updated to add the new store.
Compaction of the original store cleans up out-of-range entries.
Log entry 160-bit key
KeyFrag
Datastore list Datastore list
Key Len Data
Atomic update
of datastore list
In-memory
Hash Index
Scan and split
Data in new range
Data in original range
Data log
Concurrent
inserts
KeyFrag Valid
Offset
Inserted values
are appended
(b)
(c)