The advent of multicore CPUs and manycore GPUs
means that mainstream processor chips are now parallel
systems. Furthermore, their parallelism continues to scale
with Moore’s law. The challenge is to develop mainstream
application software that transparently scales its parallelism to leverage the increasing number of processor cores,
much as 3D graphics applications transparently scale
their parallelism to manycore GPUs with widely varying
numbers of cores.
According to conventional wisdom, parallel programming is difficult. Early experience with the CUDA1, 2
scalable parallel programming model and C language,
however, shows that many sophisticated programs can be
readily expressed with a few easily understood abstractions. Since NVIDIA released CUDA in 2007, developers
have rapidly developed scalable parallel programs for
a wide range of applications, including computational
chemistry, sparse matrix solvers, sorting, searching, and
physics models. These applications scale transparently to
hundreds of processor cores and thousands of concurrent
threads. NVIDIA GPUs with the new Tesla unified graphics and computing architecture (described in the GPU
sidebar) run CUDA C programs and are widely available
in laptops, PCs, workstations, and servers. The CUDA
with CUDA
Is CUDA the parallel programming model that
application developers have been waiting for?