figure 1. the Intrepid Blue Gene/P open Science machine at argonne national Laboratory, with 163,840 cores and 557tflops/sec (peak).
photograph Courtesy of argoNNe NatIoNaL Laboratory
ing essentially equivalent execution
sequences, thus reducing testing efficiency. These methods fare even worse
at large problem scales. Consider the
costs of HPC bugs. A high-end HPC
center costs hundreds of millions of
dollars to commission, and the machines become obsolete within six
years; in many centers, the annual
electricity bill can run more than $3
million, and research teams apply for
computer time through competitive
proposals, spending years planning
their experiments. In addition to these
costs, one must add the costs to society
of relying on potentially defective software to inform decisions involving issues of great public importance (such
as climate change).
Formal methods can play an important role in debugging and verifying
MPI applications. Here, we describe
existing techniques, including their
pros and cons, and why they have value
beyond MPI, addressing the general
needs of future concurrency application developers who will inevitably use
low-level concurrency APIs.
Historically, parallel systems
have used either message passing or
shared memory for communication.
Compared to other message-passing
systems noted for their parsimony,
MPI supports a large number of co-
hesively engineered features essential
for designing large-scale simulations;
for example, MPI- 2.221 specifies more
than 300 functions, though most de-
velopers use only a few dozen in any
given application.