A key goal of the optimization phase
is reduction of communication overhead via a range of techniques, including execution of communication in
parallel to computation, elimination
of redundant communication, and exploitation of collective communication
primitives whenever possible. Overlap
analysis detects simple communication patterns (such as stencils), using
this information to improve local data
management, as well as the organization of communication. Gupta et al.
presented a general framework for optimizing communication in data-parallel
programs. In the final code-generation
phase, the optimized parallel program
is transformed into a Fortran program
with explicit message passing.
The inspector-executor paradigm17
is an important method for runtime
optimization of parallel loops not amenable to static analysis due to irregular
array accesses (such as through subscripted subscripts).
experience with the Language
The initial response to HPF can be characterized as cautious enthusiasm. A
large part of the high-performance user
community was hopeful that the high-level abstractions provided by the language would make parallel programs
portable and efficient without requiring explicit control of message passing.
On the other hand, vendors hoped HPF
would expand the market for scalable
parallel computing. Several major vendors, including DEC, IBM, and Thinking Machines, initiated independent
compiler efforts; others offered OEM
versions of compilers produced by independent software companies (such
as Applied Parallel Research and the
Portland Group, Inc.). At its peak, 17
vendors offered HPF products and
more than 35 major applications were
written in HPF, at least one with more
than 100,000 lines of code.
Much HPF experience was reported
at meetings of the HPF Users Group in
Santa Fe, NM (1997), Porto, Portugal
(1998), and Tokyo (2000).
16 The Tokyo
meeting was notable for demonstrating the strong interest in HPF in Japan,
later reemphasized by the Earth Simulator19 featuring a high-functionality
HPF implementation supporting the
HPF/JA extensions. The IMPACT-3D
fluid simulation for fusion science on
at its peak, 17
hPf products and
more than 35 major
written in hPf,
at least one with
more than 100,000
lines of code.
the Earth Simulator achieved 40% of its
peak speed and was awarded a Gordon
Bell Prize at the Supercomputing conference in 2002 in Baltimore.
As the language was applied to a
more diverse set of application programs it became clear that in many
cases its expressive power allowed
the formulation of scientific codes
in a clearer, shorter, less error-prone
way than was possible based on explicit message passing. HPF did best
on simple, regular problems (such as
dense linear algebra and partial differential equations on regular meshes).
However, for some data-parallel applications it was difficult to achieve the
all-important goal of high target-code
performance. As a result, frustrated users, particularly in the U.S., switched to
MPI. This migration significantly reduced demand for HPF, leading compiler vendors to reduce, even abandon,
their development efforts. The end
result was HPF never achieved a sufficient level of acceptance among leading-edge users of high-performance
parallel computing systems.
Here, we explore the main reasons
for this development:
Missing language features. To
achieve high performance on a variety
of applications and algorithms, a parallel programming model must support a range of different kinds of data
distributions. The original HPF specification included three main classes of
such distributions—BLOCK, CYCLIC,
and CYCLIC(K)—motivated largely by
the requirements of regular algorithms
operating on dense matrices and well
adapted to the needs of linear algebra.
However, many relevant applications
need more general distributions that
deal efficiently with dynamic and irregular data structures. Examples include multiblock and multigrid codes,
finite element codes (such as crash
simulations operating on dynamically
changing unstructured grids), spectral
codes (such as weather forecasting),
and distributed sparse matrix codes.
Algorithmic strategies in these codes
were difficult or even impossible to express within HPF without sacrificing
too much performance. Though this
problem was addressed to a certain
degree in HPF 2.0 through basic support for irregular data distributions,
the damage was done.