in C is a significant
problem as well.
In an object-oriented
as X10, objects
know how big
they are and
tion: how does a process know when
not only it is out of work, but all the
other processes are also out of work?
Algorithms to handle this (at least at
lower orders of scale) date back some
30 years to Dijkstra et al.
4 and have
been refined a number of times since.
The SPMD solution is based on them.
The X10 solution is entirely different,
relying on threading on demand.
Let’s look at X10 first. A single
X10 activity begins the action by asking each peer in a set to start searching a part of the tree. If one of these
peers runs out of work, it has a list of
“nearby” peers it is permitted to ask
for more work. It does this by sending a reference to itself to each of its
neighbors and then lets its current
activity die. The object sending these
requests, however, continues to exist.
Thus, if one of the idle peer’s neighbors has work to spare, it can spawn a
new activity at the idle peer’s place in
which that peer can resume work. A little handshaking is involved if the peer
wants to ensure only one neighbor’s
work is taken on, but otherwise that is
the whole story. Since the root activity
has spawned the multiple peer activities as asyncs within a single finish block, it can merely exit that block
when none of these asyncs remains
active thereby finishing the job.
The C solution is entirely different.
There is no peer object at each processor ready to receive a remote process
call, and even if there were, there is no
analog of the X10 finish block. The
naïve way of handling an idle peer is to
let it hang on a synchronous receive,
but then who is responsible for knowing when to send that peer the “all
done” message? That is exactly the
problem Dijkstra et al. were addressing by a clever method of polling all
In the C solution each processor
must perform three activities: the
primary work (namely, searching and
processing its share of the tree); lis-
tening for a termination control mes-
sage; and listening to its neighbors
for shared work requests. By using
two nonblocking receives to listen for
the control and sharing communica-
tions, each processor has, in effect,
three concurrent activities: the pri-
mary activity and the two pending re-
ceives. Each processor’s main loop is
responsible for dealing with incom-
ing control and sharing requests in a
timely manner. Most of the complex-
ity is involved in deciding who sends
what control information to whom.
Managing memory and handling
errors. The issues discussed previously
pertain to the fit between the natural
threading models of our programs on
the one hand and the X10 APGAS and
MPI SPMD parallel models on the oth-
er. There are also several well-known
C-related shortcomings that impacted
our productivity in writing serial sec-
tions of code. These shortcomings are
not present in X10. On average, we en-
countered about six or more of these
well-known problems per 1,000 lines
of C code, and the overall impact was
significant. Many required addition-
al effort because they did not reveal
themselves close to the point where a
correction was needed.
Memory leaks. X10, like many re-
cent languages, has automatic gar-
bage collection. The need to explic-
itly manage storage has long been
recognized as one of the serious im-
pediments to C coding productivity.
You might think that in a problem
as simple as, for example, SSCA 1,
with around 1,200 lines of relatively
code, this would not be an issue—but
it is, especially if the code is to run for
a long time (or be incorporated into
a frequently called library routine).
Here memory leaks need to be found
and eliminated, a task that can eas-
ily take one or two days for the places
found in our code where memory was
allocated but never freed.
Getting memory references right. The
lack of introspection in C is a significant problem as well. In an object-oriented language such as X10, objects
know how big they are and what outstanding references exist. In C those
calculations must be done by hand.
It is not that, using X10, you cannot
miscompute an array extent. It is still
possible, but X10 will catch an out-of-bounds access to an array at runtime
at the point of the access. Even skilled
C programmers are unlikely to take
the time to bulletproof every array access. Since errant accesses are usually
detected far from the point where the
code needs to be corrected, these errors are not easily found and fixed.