and program productivity. In the best
case, parallelism enables new implementations of languages and features;
for example, parallel garbage collectors reduce the pause time of computational threads, thereby enabling the
use of safe languages in applications
with real-time constraints.
Another approach that trades performance for productivity is to hide the
underlying parallel implementation.
Domain-specific languages and libraries can provide an implicitly parallel
programming model that hides parallel programming from most developers, who instead use abstractions
with semantics that do not change
when running in parallel. For example,
Google’s MapReduce library utilizes
a simple, well-known programming
paradigm to initiate and coordinate independent tasks; equally important, it
hides the complexity of running these
tasks across a large number of computers. 3 The language and library implementers may struggle with parallelism,
but other developers benefit from multicore without having to learn a new
Parallel software. Another major
category of applications and systems
already take advantage of parallelism;
the two most notable examples are servers and high-performance computing,
each providing different but important
lessons to systems developers.
Servers have long been the main
commercially successful type of parallel system. Their “embarrassingly parallel” workload consists of mostly independent requests that require little or
no coordination and share little data.
As such, it is relatively easy to build a
parallel Web server application, since
the programming model treats each
request as a sequential computation.
Building a Web site that scales well is
an art; scale comes from replicating
machines, which breaks the sequential
abstraction, exposes parallelism, and
requires coordinating and communicating across machine boundaries.
High-performance computing followed a different path that used parallel hardware because there was no
alternative with comparable performance, not because scientific and
technical computations are especially
well suited to parallel solution. Parallel
hardware is a tool for solving problems.
The popular programming models—
MPI and OpenMP—are performance-focused, error-prone abstractions that
developers find difficult to use. More
recently, game programming emerged
as another realm of high-performance
computing, with the same attributes
of talented, highly motivated programmers spending great effort and time
to squeeze the last bit of performance
from complex hardware. 19
If parallel programming is to be a
mainstream programming model, it
must follow the path of servers, not of
high-performance computing. One alternative paradigm for parallel computing “Software as a Service” delivers software functionality across the Internet
and revisits timesharing by executing
some or all of an application on a shared
server in the “cloud.” 2 This approach to
computing, like servers in general, is
embarrassingly parallel and benefits directly from Moore’s Dividend. Each application instance runs independently
on a processor in a server. Moore’s Dividend accrues directly to the service provider, even if the application is sequential. Each new generation of multicore
processors halves the number of computers needed to serve a fixed workload
or provide the headroom needed to add
features or handle greater workloads.
Despite the challenges of creating a
new software paradigm and industry,
this model of computation is likely to
be popular, particularly for applications
that do not benefit from multicore.
Moore’s Dividend was spent in many
ways and places, ranging from programming languages, models, architectures, and development practices,
up through software functionality.
Parallelism is not a surrogate for faster
processors and cannot directly step
into their roles. Multicore processors
will change software as profoundly as
previous hardware revolutions (such
as the shift from vacuum tubes to transistors or transistors to integrated circuits) radically altered the size and cost
of computers, the software written for
them, and the industry that produced
and sold the hardware and software.
Parallelism will drive software in new
directions (such as computationally intensive, game-like interfaces or services
provided by the cloud) rather than con-
tinuing the evolutionary improvements
made familiar by Moore’s Dividend.
Many thanks to Al Aho (Columbia University), Doug Burger (Microsoft), David Callahan (Microsoft), Dennis Gan-non (Microsoft), Mark Hill (University
of Wisconsin), and Scott Wadsworth
(Microsoft) for helpful comments and
to Oliver Foehr (Microsoft) and Nachi
Nagappan (Microsoft) for assistance
with Microsoft data.
1. aiken, M., Fähndrich, M., Hawblitzel, C., Hunt, g.,
and larus, J.r. deconstructing process isolation.
In Proceedings of the ACM SIGPLAN Workshop on
Memory Systems Performance and Correctness (san
Jose, Ca, oct.). aCM Press, new york, 2006, 1–10.
2. Carr, n. The Big Switch: Rewiring the World, From
Edison to Google. W. W. norton, new york, 2008.
3. dean, J. and ghemawat, s. Mapreduce: simplified
data processing on large clusters. Commun. ACM 51,
1 (Jan. 2008), 107–113.
4. ekman, M., Warg, F., and nilsson, J. an in-depth look
at computer performance growth. ACM SIGARCH
Computer Architecture News 33, 1 (Mar. 2005), 144–147.
5. Foehr, o. personal email communications, June 30, 2008.
6. gates, b. Personal email (apr. 10, 2008).
7. Hachman, M. Intel’s gelsinger predicts Intel Inside
everything. PC Magazine (July 3, 2008).
8. Hill, M.d. and Marty, M.r. amdahl’s law in the multicore
era. IEEE Computer 41, 7 (July 2008), 33–38.
9. Intel. the evolution of a revolution. santa Clara,
Ca, 2008; download.intel.com/pressroom/kits/
10. Intel. Excerpts from A Conversation with Gordon
Moore: Moore’s Law. Video transcript, santa Clara,
Ca, 2005; ftp://download.intel.com/museum/Moores_
11. Jann, J., burugula, r.s., dubey, n., and Pattnaik, P.
end-to-end performance of commercial applications
in the face of changing hardware. ACM SIGOPS
Operating Systems Review 42, 1 (Jan. 2008), 13–20.
12. Kejariwal, a., Hoflehner, g.F., desai, d. lavery, d.M.,
nicolau, a., and Veidenbaum, a. V. Comparative
characterization of sPeC CPu2000 and CPu2006
on Itanium architecture. In Proceedings of the 2007
ACM SIGME TRICS International Conference on
Measurement and Modeling of Computer Systems (san
diego, Ca, June). aCM Press, new york, 2007, 361–362.
13. lohr, s. and Markoff, J. Windows is so slow, but why?
New York Times (Mar. 27, 2006); www.nytimes.
14. Maraia, V. The Build Master: Microsoft’s Software
Configuration Management Best Practices. addison-Wesley, upper saddle river, nJ, 2006.
15. Mcgraw, g. Software Security: Building Security In.
addison-Wesley Professional, boston, Ma, 2006.
16. Mitchell, n., sevitsky, g., and srinivasan, H. the diary of
a datum: an approach to analyzing runtime complexity
in framework-based applications. In Proceedings of
the Workshop on Library-Centric Software Design (san
diego, Ca, oct.). aCM Press, new york 2005, 85–90.
17. Moore, g.e. Cramming more components onto
integrated circuits. Electronics 38, 8 (apr. 1965), 56–59.
18. olukotun, K. and Hammond, l. the future of
microprocessors. ACM Queue 3, 7 (sept. 2005), 26–29.
19. sweeney, t. the next mainstream programming language:
a game developer’s perspective. In Proceedings of
the 33rd ACM SIGPLAN-SIGACT Symposium on
Principles of Programming Languages (Charleston,
sC, Jan.). aCM Press, new york, 2006, 269–269.
20. yelick, K. alttab. discussion on parallelism at
Microsoft research, redmond, Wa, July 19, 2006.
James Larus ( email@example.com) is director of
software architecture in the Cloud Computing Futures
project in Microsoft research, redmond, Wa.
© 2009 aCM 0001-0782/09/0500 $5.00