ability of pre-built tools for architecture research, however. Just as SimpleScalar7 created a flood of research
on super-scalar microarchitecture,
the availability of pre-canned tools
and benchmarks for CMPs will create
a flood of research that is one delta
away from existing CMP designs. But
is this the type of research academics
should be conducting? As academics,
shouldn’t we be looking much farther
downfield, to the places where industry is not yet willing to go? This is an
age-old quandary in our community
and will likely continue to be so. Such
a debate will certainly continue to exist in our research community for the
foreseeable future.
In the computer architecture field
there is a cynical saying that goes
something like “we design tomorrow’s
systems with yesterday’s benchmarks.”
This author finds this statement extreme, but there is some underlying
merit to it. For example, there are far
more managed-code and scripting language developers out there than C/C++
ones. Yet the majority of benchmarks
used in our field are written in C. Fortunately, this is changing, with newer
benchmarks such as SPECjvm and
SPECjbb. Moreover, a few researchers
are starting to focus on performance
issues of managed and scripted code.
Looking forward, there is a very real
need for realistic multithreaded benchmarks. Recent work, suggests a kernel-driven approach is sufficient. 33 As with
the whole of architecture evaluation
techniques, the jury is still out on what
is the proper methodology.
Instruction-Level Parallelism.
Finally, a large number of architects,
myself among them, are still putting
enormous effort into finding additional instruction-level parallelism
(ILP). Some of these architects don’t
have complete faith that multicore
will be a success. Others recognize
that improvements in single-threaded performance benefit multicore as
well, as parts of applications will be
sequential or require a few threads to
execute quickly. Over the years, these
researchers have sought to find ILP in
every nook and cranny of the research
space. Everything from new instruction set architectures, new execution
models, to better branch predictors,
caches, register management, instruc-
People far older
and wiser than me
contend this is the
most exciting time
for architecture
since the invention
of the computer.
What makes it
exciting is that
architecture is in
the unique position
of being at the
center of the future
of computer science
and the it industry.
tion scheduling, and so on. The list of
areas explored is endless.
Alongside the development of x86
microprocessors in the 1990s and
2000s, Intel and HP sunk enormous
effort and dollars into developing another line of processors, Itanium, 22
that gain performance from ILP. Itanium is a Very Long Instruction Word
(VLIW) processor, in the mold of far
earlier work on the subject. 15 Such processors promise performance from
ILP at reduced complexity compared
to superscalar designs, by relying on
sophisticated compilation technology.
VLIW is a fine idea; it communicates
more semantic knowledge about fine-grained parallelism from the software
to the hardware. If technically such
an approach is useful, why don’t you
have an Itanium processor on your
desktop? In a nutshell, such processors never achieved a price point that
fit well in the commodity PC market.
Moreover, in order to maintain binary
compatibility with x86, sophisticated
binary translation mechanisms had
to be employed. After such translation, existing code saw little to no
performance benefit from executing
on Itanium. Consumers were loath to
spend more on a system that was no
faster, if not slower, than the cheaper
alternative, for the promise that someday faster native-code applications
would arrive. There is a lesson here
for multicore systems as well: without
tangible benefits, consumers will not
spend money on new hardware just
for its technical superiority.
Does ILP still matter? This author
would argue it does. As mentioned
earlier, even parallel programs have
sequential parts. Legacy code still matters to the IT industry. Is there more
ILP to be had? This is a more difficult
question to answer. The seminal work
in this area21 suggests there is. Extracting it from applications, however, is no
trivial matter. The low-hanging fruit
was gone before I even entered the
field! Aggressive speculation to address
the memory wall, the inherent difficultly in predicting certain branches,
and the false control and memory dependencies introduced by the imperative language programming model is
required. This must be carried out by
architectures that are simple to design
and validate, lack monolithic control