Vie w p o i n t Daniel Kunkle and Gene Cooperman
Substituting disk for RAM, disk-based computation is a way to increase working memory and achieve results that are not otherwise economical.
Disk-based computation represents a major new use of disks, in addition to the three historical uses: file systems, databases, and virtual memory. We recently demonstrated the importance of this fourth case by showing progress on a 25- year-old conjecture: determine how many moves suffice to solve Rubik’s Cube. We chose Rubik’s Cube because it has long served as a computationally challenging problem in which practitioners from a variety of disciplines have tested the efficacy of their techniques.
Our working group coined the term “disk-based computation” to describe our five-year effort to make use of parallel disks in scientific computation, including the many disks already available in a computational cluster. In doing so, the humble disk is elevated to a status normally reserved for RAM. RAM equivalence gives an application several orders of magnitude more working space for the same financial price. Such parallel disk-based methods are often based on lower-level external memory algorithms (such as those surveyed in [ 3]).
LISA HANE Y
Our work reached the mainstream media in 2007 when we showed that Rubik’s Cube can be solved in 26 moves or less [ 1]. At its heart, our computation simply enumerates and stores possible configurations of the puzzle. But, with more than 4. 3 1019 possible configurations, proving that 26 moves suffice
requires many terabytes of main memory. It was only our insight that “disk is the new RAM” that enabled us to overcome this memory barrier.
Rubik’s Cube is an example of a large enumeration problem for which disk-based computation may lead to breakthroughs in many different problem domains, including group theory, hardware and software verification, coding theory, and constraint satisfaction. In them, one has an initial state, a method to produce neighboring states, and a need to store all reachable states. New powerful multicore computers are beginning to allow us to generate neighboring states faster than ever before. However, the ability to do so often means we also reach the limits of RAM more quickly than ever before.
Limiting ourselves to 4GB of main memory per computer is an arbitrary restriction not required by current technology. We are all conditioned by decades of history to regard disk as a hopelessly slow cousin to RAM. However, a simple back-of-the-envelope calculation shows this does not have to be so. The bandwidth of commodity disks is on the order of 100MB/s. A computer cluster with 50 disks provides 50 times the aggregate bandwidth, or 5GB/s, which is close to the bandwidth of commodity RAM. Thus 50 local disks provide the moral equivalent of a single extremely large RAM subsystem.
Viewed this way, a 50-node scientific computing cluster would be able to perform like a powerful parallel computer endowed with a single 10TB RAM subsystem. Justifying the use of distributed
References:
Archives