Figure 1: In this 75x90 arcmin view of the Rho Oph dark cloud
as seen by 2MASS, the three-color composite is constructed using
Montage. J band is shown as blue, H as green, and K as red.
(Image courtesy of Bruce Berriman and J. Davy Kirkpatrick.)
Southern California [38]. These maps show the maximum seismic
shaking that can be expected to happen in a given region over a period
of time (typically 50 years).
Figure 3 shows a map constructed from individual computational
points. Each point is obtained from a hazard curve (shown around the
map) and each curve is generated by a workflow containing approximately 800,000 to 1,000,000 computational tasks [ 6]. This application
requires large-scale computing capabilities such as those provided by
the NSF TeraGrid [47].
In order to support such workflows, software systems need to
1) adapt the workflows to the execution environment (which, by necessity, is often heterogeneous and distributed),
2) optimize workflows for performance to provide a reasonable time
to solution,
Figure 2: A graphical representation of the Montage workflow
with 1,200 computational tasks represented as ovals. The lines
connecting the tasks represent data dependencies.
Figure 3: In this shake map of Southern California, points on
the map indicate geographic sites where the CyberShake
calculations were performed. The curves show the results of
the calculations. (Image courtesy of CyberShake Working Group,
Southern California Earthquake Center including Scott
Callaghan, Kevin Milner, Patrick Small, and Tom Jordan.)
3) provide reliability so that scientists do not have to manage the potentially large numbers of failures, and
4) manage data so that it can be easily found and accessed at the end
of the execution.
Science Clouds
Today, clouds are also emerging in academia, providing a limited
number of computational platforms on demand: Cumulus [49],
Eucalyptus [33], Nimbus [31], OpenNebula [43]. These science clouds
provide a great opportunity for researchers to test out their ideas and
harden codes before investing more significant resources and money
into the potentially larger-scale commercial infrastructure.
To support the needs of a large number of different users with different demands in the software environment, clouds are primarily
built using resource virtualization technologies [ 2, 7, 50] that enable
the hosting of a number of different operating systems and associated
software and configurations on a single hardware host.
Clouds that provide computational capacities (Amazon EC2 [ 1],
Nimbus, Cumulus) are often referred to as an infrastructure as a service (IaaS) because they provide the basic computing resources needed
to deploy applications and services. Platform as a service (PaaS) clouds
such as Google App Engine [ 17] provide an entire application development environment including frameworks, libraries, and a deployment container. Finally, software as a service (SaaS) clouds provide
complete end-user applications for tasks such as photo sharing,
instant messaging [ 25], and many others.
Commercial clouds were built with business users in mind, but scientific applications can benefit from them as well. Scientists, however,
often have different requirements than enterprise customers. In particular, scientific codes often have parallel components and use MPI
[ 18] or shared memory to manage message-based communication
between processors. More coarse-grained parallel applications such as
workflows rely on a shared file system to pass data between processes.
Crossroads
www.acm.org/crossroads
Spring 2010/ Vol. 16, No. 3