it is responsible for) is a deterministic function of the back-end node ID. If a front-end node fails, data does not move
between back-end nodes, though virtual nodes may have to
attach to a new front end.
FAWN-KV uses a 160 bit circular ID space for VIDs and
keys. Virtual IDs are hashed identifiers derived from the
node’s address. Each VID owns the items for which it is the
item’s successor in the ring space (the node immediately
clockwise in the ring). As an example, consider the cluster
depicted in Figure 3 with five physical nodes, each of which
has two VIDs. The physical node A appears as VIDs A1 and A2,
each with its own 160 bit identifiers. VID A1 owns key range
R1, VID B1 owns range R2, and so on.
3. 4. 2. Replication and consistency
FAWN-KV offers a configurable replication factor for fault
tolerance. Items are stored at their successor in the ring
space and at the R − 1 following virtual IDs. FAWN-KV uses
chain replication21 to provide strong consistency on a per
key basis. Updates are sent to the head of the chain, passed
along to each member of the chain via a TCP connection
between the nodes, and queries are sent to the tail of the
chain. By mapping chain replication to the consistent hashing ring, each virtual ID in FAWN-KV is part of R different
chains: it is the “tail” for one chain, a “mid” node in R − 2
chains, and the “head” for one. Figure 4 depicts a ring with
Figure 3. Consistent hashing with five physical nodes and two virtual
IDs each.
Range R1 = (2150, 210]
E2
B2
Range R2 = (210, 220] A1
B1
F2
Range R3 = (220, 255]
D1
A2
E1
D2
F1
Figure 4. overlapping chains in the ring—each node in the ring is part
of R = 3 chains.
E2
Range R1
A1
B2
Range R3
C1
A1B1 C1
D1 B1C1
D1E1 C1
C1 is tail
for R1
C1 is mid for R2
C1 is head for R3
Range R2
B1
F2
C2
D1
A2
E1
D2
F1
A1
2. put(key, value, id)
3. put(key, value)
4. put
Figure 5. Life cycle of a put with chain replication—puts go to the head
and are propagated through the chain. Gets go directly to the tail.
Front-end
&
Cache
1. put(key, value, id)
B1
5. put
6a. put_resp(key, id)
8. put_ack
7. put_ack
C1
6b. put_cb(key, id)
six physical nodes, where each has two virtual IDs (V = 2),
using a replication factor of 3. In this figure, node Cl is the
tail for range Rl, mid for range R2, and tail for range R3.
Figure 5 shows a put request for an item in range R1.
The front end sends the put to the key’s successor, VID A1,
which is the head of the replica chain for this range. After
storing the value in its datastore, A1 forwards this request
to B1, which stores the value and forwards the request to the
tail, C1. After storing the value, Cl sends the put response
back to the front end and sends an acknowledgment back
up the chain indicating that the response was handled
properly.
For reliability, nodes buffer put requests until they
receive the acknowledgment. Because puts are written
to an append-only log in FAWN-DS and are sent in-order
along the chain, this operation is simple: nodes maintain
a pointer to the last unacknowledged put in their datastore
and increment it when they receive an acknowledgment.
By using a log-structured datastore, chain replication in
FAWN-KV reduces to simply streaming the datastore from
node to node.
Get requests proceed as in chain replication—the front
end directly routes gets to the tail of the chain for range R1,
node Cl, which responds to requests. Any update seen by
the tail has therefore also been applied by other replicas in
the chain.
4. EVALuATIon
We begin by characterizing the baseline I/O performance
of a node. We then show that FAWN-DS’s performance is
similar to the node’s baseline I/O capability. To illustrate
the advantages of FAWN-DS’s design, we compare its performance to an implementation using the general-purpose
BerkeleyDB, which is not optimized for flash writes. We
then study a prototype FAWN-KV system running on a
21-node cluster, evaluating its energy efficiency in queries
per second per Watt.
evaluation hardware: Our FAWN cluster has 21 back-end
nodes built from commodity PCEngine Alix 3c2 devices,
commonly used for thin clients, kiosks, network firewalls,
wireless routers, and other embedded applications. These
devices have a single-core 500MHz AMD Geode LX processor, 256MB DDR SDRAM operating at 400MHz, and
100 Mbit/s Ethernet. Each node contains one 4GB Sandisk
Extreme IV CompactFlash device. A node consumes 3 W
when idle and a maximum of 6 W when using 100% CPU,
network, and flash. The nodes are connected to each other