Viewpoint
Despite the fact that RAM stands for random access memory, we would almost
never use the “new RAM” (disk) in random-access mode.
disks as a multi-terabyte main memory requires a
small amount of math, as well as several somewhat
larger caveats. A typical scientific computing cluster
includes 200GB of often-unclaimed disk space per
computer. A 50-node cluster provides 10TB of disk.
As a nice side benefit, in today’s commodity computer market, this 10TB of idle local disk space is
essentially free.
How can we treat 10TB of disk space as if it were
RAM? The answer depends on consideration of disk
bandwidth, disk latency, and network bandwidth:
Thesis. Because 50 disks provide approximately
the same bandwidth as a single RAM subsystem, the
local disks of a computer cluster can be regarded as if
they were a single very large RAM subsystem;
Caveat 1. Disk latency is much more limiting
than disk bandwidth. Therefore, despite the fact that
RAM stands for random access memory, we would
almost never use the “new RAM” (disk) in random-access mode. The old-fashioned RAM already serves
as our random-access cache;
Caveat 2. The new RAM is distributed across the
local-area network. The aggregate network bandwidth of a cluster (even gigabit Ethernet) may not
fully support the ideal 5GB/s aggregate bandwidth
of the new RAM. Parallel algorithms must therefore
be restructured to emphasize local access over network access. (This restriction is familiar to practitioners, who have long been aware of the
impossibility of accessing traditional remote RAM at
full speed over the network.)
TESTBED
The details of the Rubik’s Cube computation illustrate the benefits of disk-based computation.
Whereas people usually solve Rubik’s Cube in four
or five stages, each involving fewer than one million combinations, the large main memory of disk-based computation allows a programmer to provide
a two-stage solution where the largest subproblem
involves 1014combinations.
A person might first solve the top layer of the
Cube (with nine smaller cubies, or individual boxlike segments), then the bottom layer, and finally the
remaining middle pieces. Solving the bottom and
middle layers requires the use of macro moves, or
sequences of moves that preserve the previous layers.
The programmer solves each of the two subproblems by performing a breadth-first search over all
possible configurations, starting with the solved
state. For the smaller of the two subproblems ( 105
configurations), this is easy.
For the larger of the two subproblems (1014
configurations), we first used the symmetries of Rubik’s
Cube to reduce it to 1012 configurations. We then
analyzed several possible algorithms, settling on the
final version, enumerating the 1012 configurations in
63 hours with the help of 128 processor cores and
7TB of disk space.
The primary difficulty in trying to extend a naive
enumeration algorithm to execute on disk is
how to efficiently perform duplicate detection,
that is, to determine when a newly generated state
has been seen before. This is typically done using a
hash table or some other data structure that relies on
random access. In the disk-based version, we avoid
random access by delaying duplicate detection and
collect many new states we check for duplicates in a
later phase.
A brief description of the methods we considered
when solving Rubik’s Cube illustrates the kinds of