the compound rates, you could argue that if performance
is the total number of transistor transitions per second—
that’s a reasonable proxy—and call that capability, then
that’s the compound of the Moore’s law transistor count
and the clock-rate increase. Those two things together are
the capability rate, and they have been going up roughly
1. 7 to 1. 8 per year, until the past few years, for quite a
long time.
That’s how much faster you would expect an idealized
thing to get year over year if the people doing it weren’t
getting any smarter, if they weren’t learning anything.
Indeed, GPUs have been
getting faster by some metrics—not all, but by some—at a
rate a little bit faster than that
capability rate. So, we can say
that their designers have been
getting smarter.
CPUs are intrinsically
sequential, which means they
have a single thread of execution. The transistors didn’t go
to more computation; they
went to all kinds of cleverness
to feed that one engine faster.
It’s an interesting historical quirk that for a while the
increase worked out close
enough to a 1. 5 compound
growth rate that people started
calling that Moore’s law, but it’s
not the same thing.
PH This is really important for people to realize. You had
this potential for CPUs to go, say, 75 percent faster every
year, but they got only 50 percent faster. That means they
were losing 25 percent a year to what they could have
achieved every year since the dawn of the microprocessor.
Not only that, 25 percent of the 50 percent was for free
because the clock got faster. So they had 50 percent more
area or more transistors. They used only 25 percent of the
capability of those extra transistors. Fewer than half of
their extra transistors were turning into anything useful.
That sounds like bad engineering to me.
When I used to consult at SGI, Kurt told me that if we
turn only half of our new transistors into performance,
we haven’t done our job as engineers. Our goal as engineers is to use our resources fully. Since the dawn of the
microprocessor, however, we’ve been throwing away half
our transistors. That’s just another way of saying how
inefficient CPUs have become.
Now, when GPUs hit the market, they got a performance increase of about a factor of 20 over CPUs. One
way of thinking about it is GPUs put us back on the
Moore’s law curve—not the number-of-transistors one,
but the real capability curve. CPUs have never been on
that curve.
KA And GPUs have arguably exceeded it, but when you
look carefully at the numbers, the bandwidths aren’t
going up at those rates. There’s more compression. Some
trickery and clever engineering have made them get faster
by a bit. Plus, the raw capability gives you this huge dis-
parity between GPU peak performance on problems that
are suited to them and what you can get on a CPU.
The interesting thing is that people in the CPU world
are not sitting on their hands anymore. As soon as they
made the decision to go parallel, the gloves came off.
They’re going to stop squandering all those transistors
on trying to make one thread go incrementally faster,
and they’re going to start using them to make a bunch of
threads go faster. This puts them potentially on the same
curve as GPUs. One prediction you might make is that
this disparity is going to stop changing so quickly as it
has for the past 20 years or so.
TD When you get into this sort of architectural discussion, the first question that always has to come up is,
where’s the bottleneck? Here’s where I see the problem
right now: if I have a nice piece of silicon with 64 or 128
cores on it, and it’s only got a few hundred or a thousand
pins, there’s still a serious communication problem off-