0.0025
0.002
Insertion Sort
Quick Sort
Merge Sort
Radix Sort
Autotuned
Figure 4: Given a model of a DNA complex, the various
kinds of reactions can be analyzed by a simulation.
0.015
0.001
0.0005
0
0 250 500
750 1000
Input size
1250 1500 1750
0.12
Figure 5: Differences in performance are shown
for solving the eigenvalue problem using various
algorithms compared to our autotuned PetaBricks algorithm. “Cutoff 25” corresponds to the hard-coded
hybrid algorithm found in LAPACK.
0.1
Bisection
DC
QR
Cutoff 25
Autotuned
0.8
0.6
0.4
Mobile
0.2
0
0
200
400 600
Input size
800
1000
Direct Jacobi SOR Multigrid Autotuned 10000 1000 100 10 1 0.1 0.01 0.001 0.0001 e-05 Input size 1 10 100 1000
Figure 6: Differences in performance are shown for
solving the 2D Poisson’s equation up to an accuracy
level of 10^ 9 using different algorithms compared to
our autotuned PetaBricks algorithm.
“To obtain portable
performance in this
new world of more
diverse architectures,
we must build
programs that can
adapt to whatever
hardware platform
they are currently
running on.”
larger computation to communication
ratios) appear to perform better when
PetaBricks produces code with less
parallelism, suggesting that the cost of
communication often outweighs any
benefits from running code containing
fine-grained parallelism. On the other
hand, the Sun Niagara processor performs best when executing code with
lots of parallelism as shown by the exclusive use of recursive algorithms.
CURRENT AND FUTURE WORK
Variable accuracy. Different algorithmic choices available to the programmer often produce outputs of varying quality. We must understand the
trade-off between computation time
and accuracy in order to make the correct choices during autotuning. To expose these trade-offs to the compiler,
we added simple language extensions
to PetaBricks. With these extensions,
our compiler can perform fully automatic compile-time and install-time
autotuning and analysis to construct
optimized algorithms for any user-specified target accuracy.
Dynamic choices. One facet of optimizing numerical algorithms we
wish to explore is the selection of algorithms based on the numerical characteristics of the input, such as the
condition number of a matrix, or the
“pre-sortedness” of an array. We are
exploring the use of run-time analysis
for algorithm selection and parameter
tuning that may make decisions based
on deeper characteristics of the input
in addition to size.
Heterogeneous architectures. One
of the most difficult types of architecture to utilize is those with different