The programming paradigm provided by CUDA has
allowed developers to harness the power of these scalable parallel processors with relative ease, enabling them
to achieve speedups of 100 times or more on a variety of
The CUDA abstractions, however, are general and
provide an excellent programming environment for multicore CPU chips. A prototype source-to-source translation
framework developed at the University of Illinois compiles CUDA programs for multicore CPUs by mapping
a parallel thread block to loops within a single physical
thread. CUDA kernels compiled in this way exhibit excellent performance and scalability. 12
Although CUDA was released less than a year ago, it is
already the target of massive development activity—there
are tens of thousands of CUDA developers. The combination of massive speedups, an intuitive programming
environment, and affordable, ubiquitous hardware is rare
in today’s market. In short, CUDA represents a democratization of parallel programming. Q
1. NVIDIA. 2007. CUDA Technology; http://www.nvidia.
2. NVIDIA. 2007. CUDA Programming Guide 1. 1; http://
NVIDIA_CUDA_Programming_Guide_ 1. 1.pdf.
3. Stratton, J.A., Stone, S. S., Hwu, W. W. 2008. M-CUDA:
An efficient implementation of CUDA kernels on multicores. IMPACT Technical Report 08-01, University of
Illinois at Urbana-Champaign, (February).
4. See reference 3.
5. Buck, I., Foley, T., Horn, D., Sugerman, J., Fatahalian, K., Houston, M., Hanrahan, P. Brook for GPUs:
Stream computing on graphics hardware. 2004.
Proceedings of SIGGRAPH (August): 777-786; http://doi.
6. Stone, S.S., Yi, H., Hwu, W. W., Haldar, J.P., Sutton,
B.P., Liang, Z.-P. 2007. How GPUs can improve the
quality of magnetic resonance imaging. The First
Workshop on General-Purpose Processing on Graphics
Processing Units (October).
7. Stone, J.E., Phillips, J.C., Freddolino, P.L., Hardy, D.J.,
Trabuco, L.G., Schulten, K. 2007. Accelerating molecular modeling applications with graphics processors.
Journal of Computational Chemistry 28( 16): 2618–2640;
8. Nyland, L., Harris, M., Prins, J. 2007. Fast n-body
simulation with CUDA. In GPU Gems 3. H. Nguyen,
9. Golub, G.H., and Van Loan, C.F. 1996. Matrix Computations, 3rd edition. Johns Hopkins University Press.
10. Buatois, L., Caumon, G., Lévy, B. 2007. Concurrent
number cruncher: An efficient sparse linear solver on
the GPU. Proceedings of the High-Performance Computation Conference (HPCC), Springer LNCS.
11. Sengupta, S., Harris, M., Zhang, Y., Owens, J.D. 2007.
Scan primitives for GPU computing. In Proceedings of
Graphics Hardware (August): 97–106.
12. See Reference 3.
Links to the latest version of the CUDA development tools,
documentation, code samples, and user discussion forums can
be found at: http://www.nvidia.com/CUDA.
LOVE IT, HATE IT? LET US KNOW
email@example.com or www.acmqueue.com/forums
JOHN NICKOLLS is director of architecture at NVIDIA for
GPU computing. He was previously with Broadcom, Silicon
Spice, Sun Microsystems, and was a cofounder of MasPar
Computer. His interests include parallel processing systems,
languages, and architectures. He has a B.S. in electrical engineering and computer science from the University of Illinois,
and M.S. and Ph.D. degrees in electrical engineering from
IAN BUCK works for NVIDIA as the GPU-Compute software
manager. He completed his Ph.D. at the Stanford Graphics Lab in 2004. His thesis was titled “Stream Computing
on Graphics Hardware,” researching programming models
and computing strategies for using graphics hardware as
a general-purpose computing platform. His work included
developing the Brook software tool chain for abstracting the
GPU as a general-purpose streaming coprocessor.
MICHAEL GARLAND is a research scientist with NVIDIA
Research. Prior to joining NVIDIA, he was an assistant professor in the department of computer science at the University
of Illinois at Urbana-Champaign. He received Ph.D. and
B.S. degrees from Carnegie Mellon University. His research
interests include computer graphics and visualization, geometric algorithms, and parallel algorithms and programming
KEVIN SKADRON is an associate professor in the department of computer science at the University of Virginia
and is currently on sabbatical with NVIDIA Research. He
received his Ph.D. from Princeton University and B.S. from
Rice University. His research interests include power- and
temperature-aware design, and manycore architecture and
programming models. He is a senior member of the ACM.
© 2008 ACM 1542-7730/08/0300 $5.00