technical Perspective
Best Algorithms + Best
Computers = Powerful match
By William Gropp
SAy yoU WANT to simulate the motion
over time of the stars in a galaxy to
learn about how galaxies formed and
why the universe appears as it does to
us. This seems like a simple computational problem—the laws of motion are
governed by Newton’s laws (let’s ignore
relativity) and simply require computing the force of every star on every other.
For N stars, that is N2 operations, and for
billions of stars, that is a billion billion
operations. Is it feasible on a single processor? Sadly, even today’s fastest single
processor and one single moment in
time, the answer is “no.” Even worse,
this computation must be repeated millions or billions of times, as the motion
of each star is followed. (If you prefer
biology or materials science, replace
“stars” with “atoms.”)
So is parallelism the answer? Unfortunately not. Even the biggest systems
today do not provide enough processing power to handle these problems
(supercomputers may be fast, but they
are not infinitely fast).
What can we do?
The following paper by Lashuk et al.
shows the way. Solving such a difficult
problem requires taking advantage of
everything possible. The authors start
with the algorithm. While the obvious
way of computing the mutual interac-
tion of N bodies on each other takes
quadratic time in the number of bod-
ies, there is a clever method that, with
rigorous error bounds, can compute
those interactions to any specified ac-
curacy in linear time! This algorithm,
developed 25 years ago by Greengard
and Rohklin, exploits a hierarchi-
cal representation of the forces. But
even a linear-time algorithm on a
single processor is not fast enough
for the problems that scientists want
to solve. So, having picked an excel-
lent algorithm, the authors proceed
to parallelize it. However, the choice
of algorithm does not make that easy.
Because the algorithm uses a hier-
archical representation, it is natural
that the data structures involved are
trees, and because the distribution of
bodies, be they stars or atoms, is non-
uniform, the trees are unbalanced.
This makes effective parallelization
difficult because a major impediment
Why should you read
the following paper?
Because it provides
a clear description
of solving a difficult
problem by
effectively combining
the best algorithms
with the best
computers,
while making use
of the most
appropriate strategies
for parallelization.
to efficient parallelization is effective
load balance. The paper then tackles
parallelism at multiple levels, starting
with distributed memory parallelism,
then continuing to multicore processes and finishing with GPUs. A very
pragmatic argument is made for the
use of hybrid parallel programming
models (specifically, MPI combined
with OpenMP and/or CUDA): a single
programming model is not always the
best solution for real applications.
Why should you read the following paper? Because it provides a clear
description of solving a difficult problem by effectively combining the best
algorithms with the best computers,
while making use of the most appropriate strategies for parallelization.
Throughout this work you will find
the use of simple yet effective complexity analysis to guide and explain
the choices that are made. This is
high-performance computing at its
best—using the best algorithms with
the biggest machines; using complexity analysis to engineer a high-performance solution, using a hierarchy of
programming models, each providing
high performance for that part of a
parallel computer. The result is a tool
that can be used by scientists to solve
problems that cannot be solved by
any other means—an example of how
computing has become critical to the
advancement of science.
William Gropp ( wgropp@illinoise.edu) is the Paul and
cynthia Saylor Professor of computer Science and deputy
director for research at the institute for advanced
computing applications and technologies at the
University of illinois Urbana-champaign, il.