ization, and so on.) The basic impact
of these performance features is to use
software-exposed parallelism to drive
much of the performance improvement. For example, not utilizing SSE
instruction and multiple cores in a
quad-core microprocessor leaves over
90% of the peak floating point performance on the floor. Simply stated: the
trend to software-exposed parallelism is
also accelerating (at the exponential pace
of Moore’s Law).
For the last three decades, software
development has largely evolved to
improve productivity while hardware
development has largely evolved to
transparently deliver sustained performance improvements to this software.
This has resulted in a divergence that
must be reconciled. Many of the pro-ductivity-driven software development
trends are either at odds with hardware
performance trends or are outpacing
the abilities of tools and various frameworks to adapt. If this seems like a very
hardware-centric way to look at things,
I will restate this from a software developer’s perspective: microprocessor
architecture is evolving in a direction
that existing software components will
be hard-pressed to leverage.
This may seem like an overly bleak
outlook; it is intended to be a reality
check. It is almost certain that software
developers will adapt to parallelism incrementally. It is also a certainty that
the physics of semiconductor manufacturing is unlikely to change in the coming years.
I have been on both sides of the
discussion between software and
hardware vendors. Software vendors
demand improved performance for
their applications through hardware
and tool enhancements (aka “the free
lunch”). Hardware vendors ask software vendors to make somewhat risky
(from a productivity and adoption point
of view) efforts to tap new performance
Most of the time, a middle path
between these perspectives is taken
wherein software vendors incrementally adopt performance features while
hardware vendors enhance tools and
provide consultative engineering support to ensure this happens. The end
result is/will be that the path to a com-
in the mainstream
is relatively new,
while many of the
tools and accumulated
informed by niche
plete refactoring of their applications
will take a longer, more gradual road.
This middle road may be all you can
hope for applications that do not evolve
significantly in terms of either usage
modes and data intensiveness.
However, for many applications
there should be a long list of features
that are enabled or enhanced by the
parallelism in hardware. For those developers, there is a better, though still
risky, option: Embrace parallelism now
and architect your software (and the components used therein) to anticipate very
high degrees of parallelism.
embracing Parallelism is important
to Software Developers
The software architect and engineering
manager need only look at published
hardware roadmaps and extrapolate
forward a few years to justify this. Examining Intel’s public roadmap alone,
we get a sense of what is happening very
concretely. The width of vector instructions is going from 128 bits (SSE), to
256 bits (AVX2), to 512 bits (Larrabee4)
within a two-year period. The richness
of the vector instructions is increasing
significantly, as well. The core counts
have gone from one to two to four to six
within a couple of years. And, two- and
four-way simultaneous multithreading
is back. Considering a quad-core processor that has been shipping for over
a year, the combined software-exposed
parallelism (in the form of multiple
cores and SSE instructions) is 16-way
(in terms of single-precision floating-point operations) in a single-socket
system (for example, desktop personal
computers). Leaving this performance
“on the floor” diminishes a key ingredient in the cycle of hardware and software improvements and upgrades that
has so greatly benefitted the computing
There is another rationalization I
have observed in certain market segments. In some application spaces,
performance translates more directly
to perceived end-user experience today. Gaming is a good example of this.
Increasing model complexity, graphics resolution, better game-play (
influenced by game AI), and improved
physical simulation are often enabled
directly by increasing performance
headroom in the platform. In application domains like this, the first-mov-ers often enjoy a significant advantage
over their competitors (thus rationalizing the risk).
I do not mean to absolve hardware
and tool vendors from responsibility. But, as I previously mentioned,
hardware vendors tend to understand
the requirements from the examples
that software developers provide.
Tool vendors are actively working to
provide parallel programming tools.
However, this work is somewhat hampered by history. Parallel programming in the mainstream is relatively
new, while many of the tools and accumulated knowledge were informed
by niche uses. In fact, the usage models (for example, scientific computing)
that drive parallel computing are not
all that different from the programs I
was looking at 20 years ago in that parallel programming class. Re-architect-ing software now for scalability onto
(what appears to be) a highly parallel
processor roadmap for the foreseeable future will accelerate the assistance that hardware and tool vendors
1. Borkar, s. Design challenges of technology scaling.
IEEE Micro (Mar.–Apr. 2006), 58–66.
2. Intel. Intel AVX (April 2, 2008); http://
3. Mitchell, n., sevitsky, G., and srinivasan, H. The Diary of
a Datum: Modeling Runtime Complexity in Framework-Based Applications. (2007); http://domino.research.
4. seiler, L. et al. Larrabee: A many-core x86 architecture
for visual computing. In Proceedings of ACM
Anwar Ghuloum ( email@example.com) is a
principal engineer in Intel Corporation’s software and
services Group, santa Clara, CA.