figure 1. throughput-oriented processors like the nViDia tesla c2050 deliver substantially higher performance on intrinsically parallel
computations, including molecular dynamics simulations.
released its first GPU supporting the
CUDA parallel computing architecture
in 2006 and is currently shipping its
third-generation CUDA architecture,
code-named “Fermi,” 24 released in 2010
in the Tesla C2050 and other processors.
Figure 1 is a nucleosome structure
(with 25,095 atoms) used in bench-
marking the AMBER suite of molecular
dynamics simulation programs. Many
core computations performed in molec-
ular dynamics are intrinsically parallel,
and AMBER recently added CUDA-ac-
celerated computations (http://am-
bermd.org/gpus/). Its Generalized Born
implicit solvent calculation for this sys-
tem running on the eight cores of a dual
four-core Intel Xeon E5462 executes at a
rate of 0.06 nanoseconds of simulation
time per day of computation. The same
calculation running on an NVIDIA Tesla
C2050 executes the simulation at a rate
of 1.04 ns/day, roughly 144 times more
work per day than a single sequential
core and just over 17 times the through-
put of all eight cores.