domains using similar data types are seismic and medical
FUTURE HARDWARE EVOLUTION:
Processor features such as instruction formats will likely
converge as a result of pressure for a consistent programming model. GPUs may migrate to narrower SIMD widths
to increase performance on branching code, while CPUs
move to broader SIMD width to improve instruction
The fact remains, however, that some tasks can be
executed more efficiently using data-parallel algorithms.
Since efficiency is so critical in this era of constrained
power consumption, a two-point design that enables the
optimal mapping of tasks to each processor model may
persist for some time to come.
Further, if the hardware continues to lead the software, it is likely that systems will have more cores than
the application can deal with at a given point in time, so
providing a choice of processor types increases the chance
of more of them being used.
Conceivably, a data-parallel system could support the
entire feature set of a modern serial CPU core, including a
rich set of interthread communications and synchronization mechanisms. The presence of such features, however,
may not matter in the longer term because the more such
traditional synchronization features are used, the worse
performance will scale to high core counts. The fastest
apps are not those that port their existing single-threaded
or even dual-threaded code across, but those that switch
to a different parallel algorithm that scales better because
it relies less on general synchronization capabilities.
Figure 2 shows a list of algorithms that have been
implemented using data-parallel paradigms with varying
degrees of success. They are sorted roughly in order of
how well they match the data-parallel model.
Data-parallel processors are becoming more broadly
available, especially now that consumer GPUs support
data-parallel programming environments. This paradigm
shift presents a new opportunity for programmers who
adapt in time.
guidance from software developers. The first to arrive will
have the best chance to drive and shape upcoming data-parallel hardware architectures and development environments to meet the needs of their particular application
When programmed effectively, GPUs can be faster
than current PC CPUs. The time has come to take advantage of this new processor type by making sure each
task in your code base is assigned to the processor and
memory model that is optimal for that task. Q
1. Govindaraju, N.K., Gray, J., Kumar, R., Manocha, D.
2006. GPUTeraSort: High-performance graphics coprocessor sorting for large database management. Proceedings of the 2006 ACM SIGMOD International Conference
on Management of Data; http://research.microsoft.com/
2. Krüger, J., Westermann, R. 2003. Linear algebra operators for GPU implementation of numerical algorithms.
ACM Transactions on Graphics 22( 3).
3. Blythe, D. 2008. The Rise of the GPU. Proceedings of the
IEEE 96( 5).
4. Shubhabrata, S., Lefohn, A.E., Owens, J.D. 2006. A
work-efficient step-efficient prefix sum algorithm.
Proceedings of the Workshop on Edge Computing Using New
Commodity Architectures: D- 26-27.
5. Lefohn, A.E., Kniss, J., Strzodka, R., Sengupta, S.,
Owens, J.D. 2006. Glift: Generic, efficient, random-access GPU data structures. ACM Transactions on Graphics 25( 1).
6. See reference 1.
SUGGESTED FURTHER READING
GPU Gems 2:
gpu_gems_ 2_home.ht ml
GPU Gems 3:
gpu-gems- 3.ht ml Ch 39 on prefix sum
Glift data structures: