job with the nüvi 850. I can’t wait to
see what the future will bring! (
Voice-based access to email on the road? It
seems almost within reach.)
Disclaimer: The views expressed
here do not necessarily reflect the views
of my employer, ACM, or any other entity besides myself.
Reader’s comment:
Information I’ve read lately on the topic
of speech recognition indicates that a
device’s ability to correctly recognize
commands depends in large measure on
the quietness of the environment. I have
often found that voice systems on my cell
phone don’t work well unless I find a quiet
place to access them. So it is good to hear
that Garmin has found an effective way
to interpret commands while driving—an
environment that you note can be noisy.
As you speak of future enhancements,
it brings up the issue of what drivers
should be able to do while on the road.
Multitasking is great, but I’m not sure
email while driving is such a good idea…
—Debra Gouchy
from Daniel Reed’s
“When Petascale is
Just too slow”
It seems as if it were just
yesterday when I was at
the National Center for
Supercomputing Applications and we
deployed a one teraflop Linux cluster
as a national resource. We were as
excited as proud parents by the configuration: 512 dual processor nodes
( 1 GHz Intel Pentium III processors),
a Myrinet interconnect, and (gasp) a
stunning 5 terabytes of RAID storage.
It achieved a then-astonishing 594 gig-aflops on the High-Performance LIN-PACK benchmark, and was ranked
41st on the Top500 list.
The world has changed since then.
We hit the microprocessor power
(and clock rate) wall, birthing the
multicore era; vector processing returned incognito, renamed as graphical processing units (GPUs); terabyte
disks are available for a pittance at
your favorite consumer electronics
store; and the top-ranked system on
the Top500 list broke the petaflop
barrier last year, built from a combination of multicore processors and
gaming engines. The last is interest-
ing for several reasons, both sociological and technological.
Petascale Retrospective
On the sociological front, I remember
participating in the first peta-scale
workshop at Caltech in the 1990s.
Seymour Cray, Burton Smith, and
others were debating future petascale hardware and architectures, a
second group was debating device
technologies, a third was discussing
application futures, and a final group
of us was down the hall debating future software architectures. All this
was prelude to an extended series of
architecture, system software, programming models, algorithms, and
applications workshops that spanned
several years and multiple retreats.
At the time, most of us were convinced that achieving petascale performance within a decade would require new architectural approaches
and custom designs, along with radically new system software and programming tools. We were wrong, or
at least so it superficially seems. We
broke the petascale barrier in 2008,
using commodity x86 microprocessors and GPUs, InfiniBand interconnects, minimally modified Linux, and
the same message-based programming model we have been using for
the past 20 years.
However, as peak system performance has risen, the number of users
has declined. Programming massively
parallel systems is not easy, and even
terascale computing is not routine.
Horst Simon explained this with an interesting analogy, which I have taken
the liberty of elaborating slightly. The
ascent of Mt. Everest by Edmund Hillary and Tenzing Norgay in 1953 was
heroic. Today, amateurs still die each
year attempting to replicate the feat.
We may have scaled Mt. Petascale, but
we are far from making it pleasant or
even a routine weekend hike.
This raises the real question: Were
we wrong in believing different hardware and software approaches would
be needed to make petascale computing a reality? I think we were absolutely right that new approaches were
needed. However, our recommendations for a new research and development agenda were not realized. At
least, in part, I believe this is because
we have been loathe to mount the integrated research and development
needed to change our current hard-ware/software ecosystem and procurement models.
exascale futures
Evolution or revolution, it’s the persistent question. Can we build reliable exascale systems from extrapolations of current technology or will
new approaches be required? There
is no definitive answer as almost any
approach might be made to work at
some level with enough heroic effort.
The bigger question is: What design
would enable the most breakthrough
scientific research in a reliable and
cost-effective way?
My personal opinion is that we
need to rethink some of our dearly
held beliefs and take a different approach. The degree of parallelism
required at exascale, even with future
many-core designs, will challenge
even our most heroic application
developers, and the number of components will raise new reliability and
resilience challenges. Then there are
interesting questions about many-core memory bandwidth, achievable
system bisection bandwidth, and I/O
capability and capacity. There are
just a few programmability issues
as well!
I believe it is time for us to move
from our deus ex machina model
of explicitly managed resources to
a fully distributed, asynchronous
model that embraces component
failure as a standard occurrence. To
draw a biological analogy, we must
reason about systemic organism
health and behavior rather than cellular signaling and death, and not
allow cell death (component failure)
to trigger organism death (system
failure). Such a shift in world view
has profound implications for how
we structure the future of international high-performance computing research, academic/government/
industrial collaborations, and system
procurements.
Tessa Lau is a research staff member at ibm almaden
research center in san Jose, ca. Daniel Reed is
director of scalable and multicore systems at microsoft
research in redmond, Wa.
© 2009 acm 0001-0782/09/0500 $10.00