The modern GPU is a versatile processor that constitutes an extreme but compelling point in the growing
space of multicore parallel computing architectures.
These platforms, which include GPUs, the STI Cell
Broadband Engine, the
Sun UltraSPARC T2, and,
increasingly, multicore x86
systems from Intel and
AMD, differentiate themselves from traditional
CPU designs by prioritizing
high-throughput processing of many parallel operations over the low-latency
execution of a single task.
GPUs assemble a large
collection of fixed-function
and software-program-mable processing resources.
Impressive statistics, such
as ALU (arithmetic logic
unit) counts and peak
floating-point rates often
emerge during discussions
of GPU design. Despite the
inherently parallel nature
of graphics, however, efficiently mapping common
rendering algorithms onto
GPU resources is extremely
challenging.
The key to high performance lies in strategies
that hardware components
and their corresponding
software interfaces use
to keep GPU processing
resources busy. GPU designs go to great lengths to obtain
high efficiency, conveniently reducing the difficulty programmers face when programming graphics applications.
As a result, GPUs deliver high performance and expose
an expressive but simple programming interface. This
interface remains largely devoid of explicit parallelism or
asynchronous execution and has proven to be portable
across vendor implementations and generations of GPU
designs.
At a time when the shift toward throughput-oriented
CPU platforms is prompting alarm about the complexity
of parallel programming, understanding key ideas behind
the success of GPU computing is valuable not only for
developers targeting software for GPU execution, but
also for informing the design of new architectures and
programming systems for other domains. In this article,
we dive under the hood of a modern GPU to look at why
A Simplified Graphics Pipeline
vertex generation
(VG)
vertex descriptors
vertex data buffers
vertex processing
(VP)
global buffers
primitive generation
(PG)
vertex topology
primitive processing
(PP)
global buffers
fragment generation
(FG)
fragment processing
(FP)
global buffers
textures
pixel operations
(PO)
fixed-function stage
shader-program defined
output image