GPU Ray Tracing
By Steven G. Parker, Heiko Friedrich, david Luebke, Keith Morley, James Bigler, Jared Hoberock, david McAllister,
Austin Robison, Andreas dietrich, Greg Humphreys, Morgan McGuire, and Martin Stich
abstract
The NVIDIA® OptiX™ ray tracing engine is a programmable
system designed for NVIDIA GPUs and other highly parallel architectures. The OptiX engine builds on the key
observation that most ray tracing algorithms can be implemented using a small set of programmable operations.
Consequently, the core of OptiX is a domain-specific just-in-time compiler that generates custom ray tracing kernels
by combining user-supplied programs for ray generation,
material shading, object intersection, and scene traversal.
This enables the implementation of a highly diverse set of
ray tracing-based algorithms and applications, including
interactive rendering, offline rendering, collision detection
systems, artificial intelligence queries, and scientific simulations such as sound propagation. OptiX achieves high performance through a compact object model and application of
several ray tracing-specific compiler optimizations. For ease
of use it exposes a single-ray programming model with full
support for recursion and a dynamic dispatch mechanism
similar to virtual function calls.
1. iNtRoDuCtioN
Many CS undergraduates have taken a computer graphics
course where they wrote a simple ray tracer. With a few
simple concepts on the physics of light transport, students
can achieve high quality images with reflections, refraction,
shadows, and camera effects such as depth of field—all of
which present challenges on contemporary real-time graphics pipelines. Unfortunately, the computational burden of
ray tracing makes it impractical in many settings, especially
where interactivity is important. Researchers have invented
many techniques for improving the performance of ray tracing, 13 especially when mapped to high-performance architectural features such as explicit SIMD instructions12 and
Single-Instruction Multiple-Thread (SIMT)-based6 GPUs. 1
Unfortunately most such techniques muddy the simplicity
and conceptual purity that make ray tracing attractive. Nor
have industry standards emerged to hide these complexities, as Direct3D and OpenGL do for rasterization.
To address these problems, we introduce OptiX, a general
purpose ray tracing engine. A general programming interface enables the implementation of a variety of ray tracing-based algorithms in graphics and non-graphics domains,
such as rendering, sound propagation, collision detection,
and artificial intelligence. This interface is conceptually
simple yet enables high performance on modern GPU architectures and is competitive with hand-coded approaches.
In this paper, we discuss the design goals of the OptiX
engine as well as an implementation for NVIDIA GPUs. In
our implementation, we compose domain-specific compi-
lation with a flexible set of controls over scene hierarchy,
acceleration structure creation and traversal, on-the-fly
scene update, and a dynamically load-balanced GPU execu-
tion model. Although OptiX primarily targets highly parallel
GPU architectures, it is applicable to a wide range of special-
and general-purpose hardware, including modern CPUs.
1. 1. Ray tracing, rasterization, and GPus
Computer graphics algorithms for rendering, or image
synthesis, take one of two complementary approaches. One
family of algorithms loop over the pixels in the image, computing for each pixel, the first object visible at that pixel; this
approach is called ray tracing because it solves the geometric
problem of intersecting a ray from the pixel into the objects.
A second family of algorithms loops over the objects in the
scene, computing for each object the pixels covered by that
object. Because the resulting per-object pixels (called
fragments) are formatted for a raster display, this approach is
called rasterization. The central data structure of ray tracing is a spatial index called an acceleration structure, used to
avoid testing each ray against all objects. The central data
structure of rasterization is the depth buffer, which stores the
distance of the closest object seen at each pixel and discards
fragments from invisible objects. While both approaches
have been generalized and optimized greatly beyond this
simplistic description, the basic distinction remains: ray
tracing iterates over rays while rasterization iterates over
objects. High-performance ray tracing and rasterization,
both focus on rendering the simplest of objects: triangles.
Historically, ray tracing has been considered slow and
rasterization fast. The simple, regular structure of depth-
buffer rasterization lends itself to highly parallel hardware
implementations: each object moves through several stages
of computation (the so-called graphics pipeline), with each
stage performing similar computations in data-parallel
fashion on the many objects, fragments, and pixels in flight
throughout the pipeline. As graphics hardware has grown
more parallel it has also grown more general, evolving from
specialized fixed-function circuitry implementing the vari-
ous stages of the graphics pipeline into fully programmable
processors that virtualize those stages onto hundreds or even
thousands of small general-purpose cores. Today’s graphics
processing units, or GPUs, are massively parallel processors
capable of performing trillions of floating-point math oper-
ations and rendering billions of triangles each second. The
computational horsepower and power efficiency of mod-
ern GPUs has made them attractive for high-performance
The original version of this paper is entitled “OptiX: A General
Purpose Ray Tracing Engine” and was published in ACM
Transactions on Graphics (TOG)—Proceedings of ACM
SIGGRAPH, July 2010, ACM