Few computations exceed that threshold; most are better matched to a Beowulf cluster. CFD (computational fluid dynamics) is very CPU intensive, but again, CFD generates
a continuous and voluminous output stream. To give an example of an adaptive mesh
simulation, the Cornell Theory Center has a Beowulf-class MPI (message passing interface) job that simulates crack propagation in a mechanical object. 6 It has about 100 MB of
input, 10 GB of output, and runs for more than seven CPU years. The computation operates at more than 1 million instructions per byte, and so is a good candidate for export to
the WAN computational grid. But, the computation’s bisection bandwidth requires that it
be executed in a tightly connected cluster. These applications require inexpensive bandwidth available to a Beowulf cluster. 7 In a Beowulf cluster, networking is 10,000 times less
expensive than WAN networking—which makes it seem nearly free by comparison.
Still, there are some computationally intensive jobs that can use grid computing.
Render farms for making animated movies seem to be good candidates for grid computing. Rendering a frame can take many CPU hours, so a grid-scale render farm begins to
make sense. For example, Pixar’s Toy Story 2 images are very CPU intensive: a 200-MB
image can take several CPU hours to render. The instruction density was 200k to 600k
instructions per byte. 8 This could be structured as a grid computation—sending a 50-MB
task to a server that computes for 10 hours and returns a 200-MB image.
BLAST, FASTA, and Smith-Waterman are an interesting case in point—they are
mobile in the rare case of a 40-CPU-day computation. These computations match a
DNA sequence against a database such as GenBank or Swiss-Prot. The databases are
about 50 GB today. The algorithms are quite CPU intensive, but they scan large parts of
the database. Servers typically store the database in RAM. BLAST is a heuristic that is 10
times faster than Smith-Waterman, which gives exact results. 9, 10 Most BLAST computations can run in a few minutes of CPU time, but there are computations that can take a
CPU month on BLAST and a CPU year on Smith-Waterman. So, it would be economical
to send Swiss-Prot ( 40 GB) to a server if it were to perform a 7,720-hour computation
for free. Typically, it does not make sense to provision a Swiss-Prot database on demand:
rather, it makes sense to set up dedicated servers (much like Google) that use inexpensive processors and memory to provide such searches. A commodity 40-GB SMP server
would cost less than $20,000 and could deliver a complex 1-CPU-hour search for less
than a dollar—the typical one-minute search would be a few millidollars.
CONCLUSIONS
Put the computation near the data. The recurrent theme of this analysis is that on-demand computing is economical only for very CPU-intensive (100,000 instructions per
byte or a CPU-day-per-gigabyte of network traffic) applications. Pre-provisioned computing is likely to be more economical for most applications—especially data-intensive ones.
How do you combine data from multiple sites? Many applications need to integrate
data from multiple sites into a combined answer. The preceding arguments suggest that
one should push as much of the processing to the data sources as possible in order to
filter the data early (database query optimizers call this “pushing predicates down the
query tree”). There are many techniques for doing this, but fundamentally it dovetails
with the notion that each data source is a Web service with a high-level object-oriented
interface.