contributed articles
Doi: 10.1145/1839676.1839694
understanding
throughput-
oriented
architectures
By michaeL GaRLanD anD DaViD B. KiRK
NVIDIA’s graphics processing units, or
GPUs, follow in the footsteps of earlier
throughput-oriented processor designs
but have achieved far broader use in
commodity machines. Broadly speaking, they focus on executing parallel
workloads while attempting to maximize total throughput, even though
sacrificing the serial performance of a
single task may be required. Though
improving total throughput at the expense of increased latency on individual
tasks is not always a desirable trade-off,
it is unquestionably the right design decision in many problem domains that
rely on parallel computations, including real-time computer graphics, video
processing, medical-image analysis,
molecular dynamics, astrophysical simulation, and gene sequencing.
Modern GPUs are fully programmable and designed to meet the needs of a
problem domain—real-time computer
graphics—with tremendous inherent
parallelism. Furthermore, real-time
graphics places a premium on the total amount of work that can be accomplished within the span of a single frame
(typically lasting 1/30 second). Due to
their historical development, GPUs have
evolved as exemplars of throughput-oriented processor architecture. Their
emphasis on throughput optimization and their expectation of abundant
available parallelism is more aggressive
than many other throughput-oriented
architectures. They are also widely available and easily programmable. NVIDIA
For workloads with abundant parallelism,
GPUs deliver higher peak computational
throughput than latency-oriented CPUs.
MuCh haS BeeN written about the transition of
commodity microprocessors from single-core to
multicore chips, a trend most apparent in CPU
processor families. Commodity PCs are now typically
built with CPUs containing from two to eight cores,
with even higher core counts on the horizon. These
chips aim to deliver higher performance by exploiting
modestly parallel workloads arising from either the
need to execute multiple independent programs
or individual programs that themselves consist of
multiple parallel tasks, yet maintain the same level
of performance as single-core chips on sequential
workloads.
A related architectural trend is the growing
prominence of throughput-oriented microprocessor
architectures. Processors like Sun’s niagara and
key;insights
throughput-oriented processors
tackle problems where parallelism is
abundant, yielding design decisions
different from more traditional latency-oriented processors.
Due to their design, programming
throughput-oriented processors
requires much more emphasis on
parallelism and scalability than
programming sequential processors.
GPus are the leading exemplars
of modern throughput-oriented
architecture, providing a ubiquitous
commodity platform for exploring
throughput-oriented programming.