Viewpoint
Despite the fact that RAM stands for random access memory, we would almost never use the “new RAM” (disk) in random-access mode.
disks as a multi-terabyte main memory requires a small amount of math, as well as several somewhat larger caveats. A typical scientific computing cluster includes 200GB of often-unclaimed disk space per computer. A 50-node cluster provides 10TB of disk. As a nice side benefit, in today’s commodity computer market, this 10TB of idle local disk space is essentially free.
How can we treat 10TB of disk space as if it were RAM? The answer depends on consideration of disk bandwidth, disk latency, and network bandwidth:
Thesis. Because 50 disks provide approximately the same bandwidth as a single RAM subsystem, the local disks of a computer cluster can be regarded as if they were a single very large RAM subsystem;
Caveat 1. Disk latency is much more limiting than disk bandwidth. Therefore, despite the fact that RAM stands for random access memory, we would almost never use the “new RAM” (disk) in random-access mode. The old-fashioned RAM already serves as our random-access cache;
Caveat 2. The new RAM is distributed across the local-area network. The aggregate network bandwidth of a cluster (even gigabit Ethernet) may not fully support the ideal 5GB/s aggregate bandwidth of the new RAM. Parallel algorithms must therefore be restructured to emphasize local access over network access. (This restriction is familiar to practitioners, who have long been aware of the impossibility of accessing traditional remote RAM at full speed over the network.)
TESTBED
The details of the Rubik’s Cube computation illustrate the benefits of disk-based computation. Whereas people usually solve Rubik’s Cube in four
or five stages, each involving fewer than one million combinations, the large main memory of disk-based computation allows a programmer to provide a two-stage solution where the largest subproblem involves 1014combinations.
A person might first solve the top layer of the Cube (with nine smaller cubies, or individual boxlike segments), then the bottom layer, and finally the remaining middle pieces. Solving the bottom and middle layers requires the use of macro moves, or sequences of moves that preserve the previous layers.
The programmer solves each of the two subproblems by performing a breadth-first search over all possible configurations, starting with the solved state. For the smaller of the two subproblems ( 105 configurations), this is easy.
For the larger of the two subproblems (1014 configurations), we first used the symmetries of Rubik’s Cube to reduce it to 1012 configurations. We then analyzed several possible algorithms, settling on the final version, enumerating the 1012 configurations in 63 hours with the help of 128 processor cores and 7TB of disk space.
The primary difficulty in trying to extend a naive enumeration algorithm to execute on disk is how to efficiently perform duplicate detection, that is, to determine when a newly generated state has been seen before. This is typically done using a hash table or some other data structure that relies on random access. In the disk-based version, we avoid random access by delaying duplicate detection and collect many new states we check for duplicates in a later phase.
A brief description of the methods we considered when solving Rubik’s Cube illustrates the kinds of
References:
Archives