summaries are valuable documentation in their own right—they promote
lifetime benefits for modularity and
maintainability, arguably compensating for upfront programmer effort. Finally, a static approach benefits from no
overhead or surprises at runtime.
In contrast, the purely runtime approaches impose less burden on the
programmer, but a disadvantage is
that the overheads in some cases may
still be too high. Further, inherently, a
runtime approach does not provide the
guarantees of a static approach before
shipping and is susceptible to surprises in the field.
We are optimistic that the recent
approaches have opened up many
promising new avenues for disciplined
shared-memory that can overcome the
problems described here. It is likely
that a final solution will consist of a judicious combination of language and
runtime features, and will derive from a
rich line of future research.
implications for hardware
As discussed earlier, current hardware memory models are an imperfect
match for even current software (
data-race-free) memory models. ISA changes
to identify individual loads and stores
as synchronization can alleviate some
short-term problems. An established
ISA, however, is difficult to change, especially when existing code works mostly adequately and there is not enough
experience to document the benefits of
the change.
Academic researchers have taken
an alternate path that uses complex
mechanisms (for example, Blundell et
al. 6) to speculatively remove the constraints imposed by fences, rolling
back the speculation when it is detected that the constraints were actually
needed. While these techniques have
been shown to work well, they come
at an implementation cost and do not
directly confront the root of the problem of mismatched hardware/software
views of concurrency semantics.
Taking a longer-term perspective,
we believe a more fundamental solution to the problem will emerge with
a co-designed approach, where future
multicore hardware research evolves
in concert with the software models research discussed in “Implications for
Languages.” The current state of hard-
We believe that
hardware that
takes advantage
of the emerging
disciplined software
programming
models is likely to
be more efficient
than a software
oblivious approach.
ware technology makes this a particularly opportune time to embark on such
an agenda. Power and complexity constraints have led industry to bet that future single-chip performance increases
will largely come from increasing numbers of cores. Today’s hardware cache-coherent multicore designs, however,
are optimized for few cores—
power-ef-ficient, performance scaling to several
hundreds or a thousand cores without
consideration of software requirements
will be difficult.
We view this challenge as an opportunity to not only resolve the problems discussed in this article, but in
doing so, we expect to build more effective hardware and software. First,
we believe that hardware that takes
advantage of the emerging disciplined
software programming models is likely
to be more efficient than a software-oblivious approach. This observation
already underlies the work on relaxed
hardware consistency models—we
hope the difference this time around
will be that the software and hardware
models will evolve together rather than
as retrofits for each other, providing
more effective solutions. Second, hardware research to support the emerging
disciplined software models is also
likely to be critical. Hardware support
can be used for efficient enforcement
of the required discipline when static
approaches fall short; for example,
through directly detecting violations of
the discipline and/or through effective
strategies to sandbox untrusted code.
Along these lines, we have recently
begun the DeNovo hardware project at
Illinois15 in concert with DPJ. We are
exploiting DPJ-like region and effect
annotations to design more power-and complexity-efficient, software-driven communication and coherence protocols and task scheduling
mechanisms. We also plan to provide
hardware and runtime support to deal
with cases where DPJ’s static information and analysis might fall short. As
such co-designed models emerge,
ultimately, we expect them to drive the
future hardware-software interface including the ISA.
conclusion
This article gives a perspective based
on work collectively spanning approximately 30 years. We have been repeat-