the compound rates, you could argue that if performance is the total number of transistor transitions per second— that’s a reasonable proxy—and call that capability, then that’s the compound of the Moore’s law transistor count and the clock-rate increase. Those two things together are the capability rate, and they have been going up roughly 1. 7 to 1. 8 per year, until the past few years, for quite a long time.

That’s how much faster you would expect an idealized thing to get year over year if the people doing it weren’t getting any smarter, if they weren’t learning anything. Indeed, GPUs have been getting faster by some metrics—not all, but by some—at a rate a little bit faster than that capability rate. So, we can say that their designers have been getting smarter. CPUs are intrinsically sequential, which means they have a single thread of execution. The transistors didn’t go to more computation; they went to all kinds of cleverness to feed that one engine faster. It’s an interesting historical quirk that for a while the increase worked out close enough to a 1. 5 compound growth rate that people started calling that Moore’s law, but it’s not the same thing. PH This is really important for people to realize. You had this potential for CPUs to go, say, 75 percent faster every year, but they got only 50 percent faster. That means they were losing 25 percent a year to what they could have achieved every year since the dawn of the microprocessor. Not only that, 25 percent of the 50 percent was for free because the clock got faster. So they had 50 percent more area or more transistors. They used only 25 percent of the capability of those extra transistors. Fewer than half of their extra transistors were turning into anything useful. That sounds like bad engineering to me.

When I used to consult at SGI, Kurt told me that if we turn only half of our new transistors into performance, we haven’t done our job as engineers. Our goal as engineers is to use our resources fully. Since the dawn of the microprocessor, however, we’ve been throwing away half our transistors. That’s just another way of saying how inefficient CPUs have become.

Now, when GPUs hit the market, they got a performance increase of about a factor of 20 over CPUs. One way of thinking about it is GPUs put us back on the Moore’s law curve—not the number-of-transistors one, but the real capability curve. CPUs have never been on that curve. KA And GPUs have arguably exceeded it, but when you look carefully at the numbers, the bandwidths aren’t going up at those rates. There’s more compression. Some trickery and clever engineering have made them get faster by a bit. Plus, the raw capability gives you this huge dis-

 

parity between GPU peak performance on problems that are suited to them and what you can get on a CPU.

The interesting thing is that people in the CPU world are not sitting on their hands anymore. As soon as they made the decision to go parallel, the gloves came off. They’re going to stop squandering all those transistors on trying to make one thread go incrementally faster, and they’re going to start using them to make a bunch of threads go faster. This puts them potentially on the same curve as GPUs. One prediction you might make is that this disparity is going to stop changing so quickly as it has for the past 20 years or so. TD When you get into this sort of architectural discussion, the first question that always has to come up is, where’s the bottleneck? Here’s where I see the problem right now: if I have a nice piece of silicon with 64 or 128 cores on it, and it’s only got a few hundred or a thousand pins, there’s still a serious communication problem off-

References:

http://www.acmqueue.com

Archives