Few computations exceed that threshold; most are better matched to a Beowulf cluster. CFD (computational fluid dynamics) is very CPU intensive, but again, CFD generates a continuous and voluminous output stream. To give an example of an adaptive mesh simulation, the Cornell Theory Center has a Beowulf-class MPI (message passing interface) job that simulates crack propagation in a mechanical object. 6 It has about 100 MB of input, 10 GB of output, and runs for more than seven CPU years. The computation operates at more than 1 million instructions per byte, and so is a good candidate for export to the WAN computational grid. But, the computation’s bisection bandwidth requires that it be executed in a tightly connected cluster. These applications require inexpensive bandwidth available to a Beowulf cluster. 7 In a Beowulf cluster, networking is 10,000 times less expensive than WAN networking—which makes it seem nearly free by comparison.

Still, there are some computationally intensive jobs that can use grid computing.

Render farms for making animated movies seem to be good candidates for grid computing. Rendering a frame can take many CPU hours, so a grid-scale render farm begins to make sense. For example, Pixar’s Toy Story 2 images are very CPU intensive: a 200-MB image can take several CPU hours to render. The instruction density was 200k to 600k instructions per byte. 8 This could be structured as a grid computation—sending a 50-MB task to a server that computes for 10 hours and returns a 200-MB image.

BLAST, FASTA, and Smith-Waterman are an interesting case in point—they are mobile in the rare case of a 40-CPU-day computation. These computations match a DNA sequence against a database such as GenBank or Swiss-Prot. The databases are about 50 GB today. The algorithms are quite CPU intensive, but they scan large parts of the database. Servers typically store the database in RAM. BLAST is a heuristic that is 10 times faster than Smith-Waterman, which gives exact results. 9, 10 Most BLAST computations can run in a few minutes of CPU time, but there are computations that can take a CPU month on BLAST and a CPU year on Smith-Waterman. So, it would be economical to send Swiss-Prot ( 40 GB) to a server if it were to perform a 7,720-hour computation for free. Typically, it does not make sense to provision a Swiss-Prot database on demand: rather, it makes sense to set up dedicated servers (much like Google) that use inexpensive processors and memory to provide such searches. A commodity 40-GB SMP server would cost less than $20,000 and could deliver a complex 1-CPU-hour search for less than a dollar—the typical one-minute search would be a few millidollars.

CONCLUSIONS

Put the computation near the data. The recurrent theme of this analysis is that on-demand computing is economical only for very CPU-intensive (100,000 instructions per byte or a CPU-day-per-gigabyte of network traffic) applications. Pre-provisioned computing is likely to be more economical for most applications—especially data-intensive ones.

How do you combine data from multiple sites? Many applications need to integrate data from multiple sites into a combined answer. The preceding arguments suggest that one should push as much of the processing to the data sources as possible in order to filter the data early (database query optimizers call this “pushing predicates down the query tree”). There are many techniques for doing this, but fundamentally it dovetails with the notion that each data source is a Web service with a high-level object-oriented interface.

References:

http://www.acmqueue.com

Archives