Vie w p o i n t Daniel Kunkle and Gene Cooperman
Solving Rubik’s Cube:
Disk Is the New RAM
Substituting disk for RAM, disk-based computation is a way to increase working
memory and achieve results that are not otherwise economical.
Disk-based computation represents a major new use of
disks, in addition to the
three historical uses: file systems,
databases, and virtual memory.
We recently demonstrated the
importance of this fourth case
by showing progress on a 25-
year-old conjecture: determine how many moves
suffice to solve Rubik’s Cube. We chose Rubik’s
Cube because it has long served as a computationally challenging problem in which practitioners
from a variety of disciplines have tested the efficacy
of their techniques.
Our working group coined the term “disk-based
computation” to describe our five-year effort to
make use of parallel disks in scientific computation,
including the many disks already available in a computational cluster. In doing so, the humble disk is
elevated to a status normally reserved for RAM.
RAM equivalence gives an application several orders
of magnitude more working space for the same
financial price. Such parallel disk-based methods are
often based on lower-level external memory algorithms (such as those surveyed in [ 3]).
LISA HANE Y
Our work reached the mainstream media in 2007
when we showed that Rubik’s Cube can be solved in
26 moves or less [ 1]. At its heart, our computation
simply enumerates and stores possible configurations
of the puzzle. But, with more than 4. 3 1019
possible configurations, proving that 26 moves suffice
requires many terabytes of main memory. It was
only our insight that “disk is the new RAM” that
enabled us to overcome this memory barrier.
Rubik’s Cube is an example of a large enumeration problem for which disk-based computation
may lead to breakthroughs in many different problem domains, including group theory, hardware and
software verification, coding theory, and constraint
satisfaction. In them, one has an initial state, a
method to produce neighboring states, and a need
to store all reachable states. New powerful multicore computers are beginning to allow us to generate
neighboring states faster than ever before. However,
the ability to do so often means we also reach the
limits of RAM more quickly than ever before.
Limiting ourselves to 4GB of main memory per
computer is an arbitrary restriction not required by
current technology. We are all conditioned by decades
of history to regard disk as a hopelessly slow cousin to
RAM. However, a simple back-of-the-envelope calculation shows this does not have to be so. The bandwidth of commodity disks is on the order of
100MB/s. A computer cluster with 50 disks provides
50 times the aggregate bandwidth, or 5GB/s, which is
close to the bandwidth of commodity RAM. Thus 50
local disks provide the moral equivalent of a single
extremely large RAM subsystem.
Viewed this way, a 50-node scientific computing
cluster would be able to perform like a powerful
parallel computer endowed with a single 10TB
RAM subsystem. Justifying the use of distributed