able silicon area per unit dollar into
exponential performance increases at
relatively fixed cost, what are we going
to do instead?
A Savior? Processor manufacturers have bet their future on a relatively
straightforward (for them) solution.
That is, if we can’t make one core execute a thread any faster, let’s just
place two cores on the die and modify
the software to utilize the extra core. In
the next generation, place four cores.
The generation after that, eight, and
so on. From a manufacturing standpoint, multicore or “manycore,” as this
approach is called, has several attractive qualities. First, we know how to
build systems with higher peak performance. If the software can utilize them,
then more cores per die will equate to
improved performance. Unlike single-threaded performance, where we really
have no clear ideas left to scale performance, multicore appears to offer us a
path to salvation.
Second, again, if the software is
there, a host of technological problems
are mitigated by multicore. For example, as long as thread communication
is kept to a minimum, it is more energy efficient to complete a fixed task
using multiple threads, compared to
executing one thread faster. Multiple
smaller, simpler cores are easier to
design than larger complex ones, thus
mitigating the design and verification
costs. Reliability, a growing problem in
processor design, also becomes easier:
simply place redundant cores on the
die and post-fabrication route requests
for defective units to one of the redundant cores, much as we do today with
DRAMs. Or, even simpler, map them
out entirely and sell a lower-cost part
to a different market segment, as Sun
Microsystems now does. Finally, wire
delay—that grand challenge that motivated a flurry of research almost a decade ago—is also mitigated: simpler
cores are smaller and clock frequency
can be reduced as performance can be
had through thread-level parallelism.
All of this sounds fantastic, except
for one thing: it is predicated on the
software being multithreaded. Just as
important for future scalability, thread
parallelism must be found in software
at a rate commensurate with Moore’s
Law, which means if today we must find
four independent threads of computa-
the question before
us as a research
field and an
industry is, now
that we no longer
know how to
Law growth in
area per unit dollar
relatively fixed chip
cost, what are we
going to do instead?
tion, in two years there must be eight,
and two years after that 16.
Processor manufacturers are not
asking a small favor from software developers. From a programmer’s perspective, multicore CPUs currently
look no different than Symmetric Multiprocessors (SMPs) that have been
around for decades. Such systems are
not widely deployed on the home and
business desktop, for good reason.
They cost more and there isn’t a significant performance advantage for
the applications these users employ.
So a reasonable question then is to ask:
What makes us think this time it’s going to work?
An optimist will make the following
arguments: First, the cost difference is
now in the reverse direction. Assuming
we could build a faster single-threaded
core, it will cost more. Design, validation, cooling, and manufacturing will
assure that fact. Second, we do know
more about parallel programming
now than ever before. Tools have actually improved, with methods to look
for race conditions and automatically
parallelize loops, 10 and the resurgent
interest in transactional programming
will bear fruit. We’ve had many years
of successful experience using parallelism in the graphics, server, and scientific computing domains. Third,
and perhaps most importantly, it just
has to work. For this reason software
companies that need their products
to achieve scalable performance must
invest heavily into parallel programming.
The hope is the commercial emphasis on
parallel computing will create solutions.
A pessimist will counter thusly:
Parallelism on the desktop has never
worked because the technical requirements it takes to write threaded code
just don’t align with the economic forces driving desktop software developers.
Writing parallel code is more difficult
than writing sequential code. It’s more
error prone and difficult to debug,
due to the non-determinism of thread
memory interleavings. Furthermore,
I have yet to meet anyone that thinks
the industry will successfully parallelize its large legacy code bases. Once a
large application has been designed
for a single-threaded execution model,
it is extremely difficult to tease it apart
and parallelize it. What this means is
programmers must feel the economic