bound devices must also deal with this
problem. Architects proposed a cornucopia of techniques to deal with faults,
from the radical, which proposed alternative processor designs2 and ways to
use simultaneous multithreaded devices, 24, 39 to the more easily adoptable
by industry, such as cache designs with
better fault resilience. Important work
also better characterized what parts of
the microarchitecture are actually susceptible to a dynamic fault.
Reliability continues to play an important part of architecture research,
but the future presents some differing
technology trends. It is this author’s
opinion that Moore’s Law will not stop
anytime soon, but it won’t be because
we shrink feature sizes down to a handful of atoms in width. 44 Rather, die-stacking will continue to provide ever
more chip real estate. These dies will
have a fixed (or even larger) feature size,
and thus the growth in dynamic faults
due to reduced feature sizes should
actually stop. Moreover, if multicore
does actually prove to be a market success, then reliability can be achieved
without enormous complexity: processors with manufactured faults can be
mapped out, and for applications that
require high reliability, multiple cores
can be used to redundantly perform
the computation. Nevertheless, despite this positive long-term outlook,
work to improve reliability will always
have purpose, as improved reliability
leads directly to improved yields (and
in the future, improved performance if
redundant cores are not required), and
thus reduced costs.
Evaluation techniques. How architects do research has changed dramatically over the decades. When ISCA first
started in the 1970s, papers typically
provided paper designs and qualitative or simple analytical arguments to
the idea’s effectiveness. Research techniques changed significantly in the
early 1980s with the ability to simulate
new architecture proposals, and thus
provide quantitative evidence to back
up intuition. Simulation and quantitative approaches have their place, but
misused, they can provide an easy way
to produce a lot of meaningless, but
convincing looking data. Sadly, it is now
commonly accepted in our community
that the absolute value of any data presented in a paper isn’t meaningful. We
take solace in the fact the trends—the
relative difference between two data
points—likely have a corresponding
difference in the real world.
As an engineer, this approach to
our field is sketchy, but workable; as
a scientist, this seems like a terrible
place to be. It is nearly impossible to
do something as simple as reproduce
the results in a paper. Doing so from
the paper alone requires starting from
the same simulation infrastructure as
the authors, implementing the idea
as the authors did, and then execut-
figure 3: Papers published in iSca 2001–2006.
Interconnect
Power
Multicore
singlecore
Alternatives to
single/Multicore
specialized Processors
emerging Technologies
Reliability
100%
75%
50%
25%
2002
2003
2004
2005
2006
2007
2008
ing the same benchmarks, compiled
with the same compiler with the same
settings, as the authors. Starting from
scratch on this isn’t tractable, and the
only real way to reproduce a paper’s
results is to ask the authors to share
their infrastructure. Another, more
insidious problem with simulation is
its too easy to make mistakes when
implementing a component model.
Because it is common, and even desirable, to separate functional ISA modeling from performance modeling,
these performance model errors can
go unnoticed, thus leading to entirely
incorrect data and conclusions. Despite these drawbacks, quantitative
data is seductive to reviewers, and
simulation is the most labor-efficient
way to produce it.
Looking forward, the picture is
muddled. Simulation will continue to
be the most important tool in the computer architect’s toolbox. The need
to model ever more parallel architectures, however, will create the need to
continue to explore different modeling
techniques because, for the moment,
the tools used in computer architecture
research are built on single-threaded
code bases. Thus, simulating an exponentially increasing number of CPU
cores means an exponential increase
in simulation time. Fortunately, several paths forward exist. Work on high
level performance models13, 28 provides
accurate relative performance data
quickly, suitable for coarsely mapping
a design space. Techniques to sample35, 41
simulation data enable architects to
explore longer-running simulations
with reasonable confidence. Finally,
renewed interest in prototyping and
using FPGAs for simulation40 will allow
architects to explore ideas that require
cooperation with language and application researchers as the speed of FPGA-based simulation is just fast enough to
be usable by software developers.
There are several advantages to
pre-built and shared tools for architecture research. They are enablers,
allowing research groups to not start
from scratch. Having shared tools has
another benefit: the bugs and inaccu-racies in those tools can be revealed
and fixed over time. Shared tools also
enable re-creating other people’s work
easier. There has been, and there will
continue to be, a downside to the avail-