in the near term different hardware vendors are taking
different approaches. Software programmers should think
carefully about these issues so that they are prepared to
influence the debate.
Which approach is most likely to dominate in the
medium to long term? I have previously argued that the
trend in rendering algorithms is toward those that build
and traverse irregular data structures. These irregular data
structures allow algorithms to adapt to the scene geometry and the current viewpoint. Explicitly managing all
data locality for these algorithms is painful, especially if
multiple cores share a read/write data structure. In my
experience, it is easier to develop these algorithms on a
cache-coherent architecture, even if achieving optimal
Programmers will have to
think more carefully about
memory-access patterns and
data-structure sizes.
performance often still requires thinking very carefully
about the communication and memory-access patterns of
the performance-critical kernels.
For these and other reasons too detailed to discuss
here, I believe that future graphics architectures will
efficiently support a cache-coherent memory model, and
that any architecture lacking these capabilities will be a
second choice at best for programmers who are developing innovative rendering techniques. Sun’s Niagara architecture provides a good preview of the kind of memory
and threading model that I anticipate for future GPUs. I
also expect, however, that cache-coherent graphics architectures will include a variety of mechanisms that provide
the programmer with explicit control over communication and memory access, such as streaming loads that
bypass the cache.
FINE-GRAINED SPECIALIZATION
The desire to support greater algorithmic diversity will
drive future graphics architectures toward greater flexibility and generality, but specialization will still be used
where it provides a sufficiently large benefit for the majority of applications. Most of this specialization will be at
a fine granularity, used to accelerate specific operations,
in contrast to the coarse, monolithic granularity used to
dictate the overall structure of the algorithms executed on
the hardware in the past.
In particular, I expect the following specialization will
continue to exist for graphics architectures:
Texture hardware. Texture addressing and filtering
operations use low-precision (typically 16-bit) values that
are decompressed on the fly from a compressed representation stored in memory. The amount of data accessed is
large and requires multithreading to deal effectively with
cache misses. These operations are a significant fraction
of the overall rendering cost and benefit enormously from
specialized hardware.
Specialized floating-point operations. Rendering
makes heavy use of floating-point square-root and reciprocal operations. Current graphics hardware provides
high-performance instructions for these operations, as
well as other operations used for shading such as swizzling and trigonometric functions. Future graphics hardware will need to do the same.
Video playback and desktop compositing. Video playback and 2D and 2.5D desktop window operations benefit
significantly from specialized hardware. Specialization of
these operations is especially important for power efficiency. I anticipate that much of this hardware will follow
the traditional coarse-grained monolithic fixed-function
model and thus will not be useful for user-written 3D
graphics programs.
Current graphics hardware also includes specialized
hardware to assist with triangle rasterization, but I expect
that this task will be taken over by software within a
few years. The reason is that rasterization is gradually
becoming a smaller fraction of total rendering costs, so
the penalty for implementing it in software is decreasing.
This trend will accelerate as more sophisticated visibility
algorithms supplement or replace the Z buffer.
As graphics software switches to more powerful visibility algorithms such as ray tracing, it may become clear
that certain operations represent a sufficiently large portion of the total computation cost that hardware acceleration would be justified. For example, future architectures
could include specialized instructions to accelerate the
data-structure traversal operations used by ray tracing.
THE CHALLENGE FOR GRAPHICS ARCHITECTS
At a high level, the key challenge facing future graphics
architectures is to strike the best balance between the
desire to provide high performance on existing graphics algorithms and the desire to provide the flexibility
needed to support new algorithms with high perfor-