HPCS productivity assessment team,
for their invaluable contributions to
Ali-Reza Adl-Tabatabai, Christos Kozyrakis
and Bratin Saha
The Ideal HPC Programming Language
Software Transactional Memory:
Why Is It Only a Research Toy?
Calin Cascaval, Colin Blundell, Maged Michael,
Harold W. Cain, Peng Wu, Stefanie Chiras, and
1. Asanovic, K., Bodik, R., Catanzaro, B.C., Gebis, J.,
Husbands, P., Keutzer, K., Patterson, D., Plishker, W.,
Shalf, J., Williams, S. and Yelick, K. The landscape of
parallel computing research: a view from Berkeley.
Technical Report No. UCB/EECS-2006-183. Electrical
Engineering and Computer Sciences, University of
California at Berkeley, 2006; http://www.eecs.berkeley.
2. Bader, D., Madduri, K., Gilbert, J., Shah, V., Kepner, J.,
Meuse, T. and Krishnamurthy, A. Designing scalable
synthetic compact applications for benchmarking high
productivity computing systems; http://www.cse.psu.
3. Danis, C. and Halverson, C. The value derived from
the observational component in an integrated
methodology for the study of HPC programmer
productivity. In Proceedings of the Third Workshop on
Productivity and Performance in High-End Computing,
4. Dijkstra, E., Feijen, W. and van Gasteren, A. Derivation
of a termination detection algorithm for distributed
computations. Information Processing Letters 16, 5
5. Ebcioglu, K., Sarkar, V., El-Ghazawi, T. and Urbanic, J.
An experiment in measuring the productivity of three
parallel programming languages. In Proceedings of
the Third Workshop on Productivity and Performance
in High-End Computing, (2006), 30–36.
6. Halverson, C. and, Danis, C. Towards an ecologically
valid study of programmer behavior for scientific
computing. In Proceedings of the First Workshop on
Software Engineering for Computational Science and
7. Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D.
and Krishnamoorthy, S. Lifeline-based global load
balancing. In Proceedings of the 16th ACM Symposium
on Principles and Practice of Parallel Programming,
John Richards is a research manager in IBM’s Watson
Group and holds an appointment as Honorary Professor
in the School of Computing at the University of Dundee,
Jonathan Brezin was trained as a mathematician and
held positions at the University of Minnesota and the
University of North Carolina in the 1960s and 1970s. He
later joined IBM Research, from which he retired this year.
Cal Swart joined IBM Research in 1982 and is currently a
senior technical staff member in the IBM Watson Group.
He is a member of Watson Life Research, exploring new
applications of cognitive computing.
Christine Halverson is an independent consultant in
Silicon Valley. Formerly she worked at IBM Research
where she spent five years working on the DARPA HPCS
initiative studying parallel programmers.
© 2014 ACM 0001-0782/14/11 $15.00.
Error handling. The lack of a convenient exception mechanism in C forces programmers to be more verbose.
This surfaced in Floyd’s algorithm,
for example, when our programmer
wanted a generic input stream to read
an ASCII stream of numeric values.
His API had an entry that tokenizes
the stream, converts the tokens to
the appropriate numeric type, and assures the value is legal. Clearly there
are a number of problems the stream
can encounter. The question is how to
handle these errors.
In the case of errors in X10, an exception is thrown whose type identifies the problem encountered. An
application can avoid providing special code for detecting generic errors
such as an unexpected end-of-file
that is discovered by lower-level functions, because they, too, can signal
errors by throwing exceptions. The
application can therefore concentrate on signaling semantic errors in
For most throwaway code, error
handling is not a serious issue, but
for production code, and for complex
communications patterns such as Di-
jkstra et al.’s termination algorithm, it
certainly is. C’s signaling mechanism
is best suited for expert use. C’s prob-
lems, however, run even deeper in a
multithreaded SPMD world. Consider
the standard library routine strtoll
the stream calls to convert a found
token to a long integer. Here is the
discussion of strtoll’s error indica-
tions as given by its “man” page:
“the strtol, strtoll... functions
return the result of the conversion,
unless the value would underflow or
overflow. If no conversion could be per-
formed, 0 is returned and the global
variable errno is set to EINVAL (the
last feature is not portable across all
platforms). If an overflow or underflow
occurs, errno is set to ERANGE…”
Consider the code a C application
needs in order to deal with the vari-
ous possible errors. Should the code
make sure to zero errno before call-
ing strtoll? After all, it might be
non-zero because of an entirely un-
related earlier problem. For the code
that checks errno to understand
what happened, it is, moreover, not
enough just to check that errno is
non-zero, because the error may be an
I/O error, a parsing error, or a range
error. Nor can you be sure errno is
thread-safe—it is not on all systems.
What then? And where in the applica-
tion should you clear errno, whose
value is a global? Which of the other
processes need to be made aware of
the problem, and how should they be
There are very good reasons for C
and MPI’s dominance in the parallel-programming community. They are
superbly documented, elegant, clean
designs that have been carefully implemented. In the hands of an experienced, disciplined professional, they
provide a level of control that is simply
not available elsewhere. Neither C nor
MPI is a standing target: both continue
to be improved, even though they are
now mature technologies.
How often, however, do the benefits
of C/MPI outweigh its costs? Through
all three of our studies, and particularly in this final one, we have seen substantial benefits—ranging from two
to six times faster development to first
successful parallel run—from using a
higher-level language and programming model. These productivity benefits might be even greater for large
programs that need to be maintained
X10 and its APGAS programming
model are now being explored in many
research and university settings. While
the language may or may not cross over
into mainstream use, it is likely the
qualities that made it so much more
productive for us will likely become
well established in the parallel community: flexible threading, automatic
garbage collection, runtime type-driv-en error checking, partitioned global
memory, and rooted exception handling are all valuable. We hope our experiments encourage those looking to
improve parallel-programmer productivity to seriously study X10’s design
and its benefits.
This work was supported by the Defense Advanced Research Projects
Agency under its Agreement No.
HR0011-07-9-0002. The authors thank
Catalina Danis, Peter Malkin, and
John Thomas, members of IBM’s