Data parallelism is a key concept in leveraging the power of todays manycore GPUs.

Users always care about performance.
Although often it’s just a matter of making
sure the software is doing only what it should, there are
many cases where it is vital to get down to the metal and
leverage the fundamental characteristics of the processor.

Until recently, performance improvement was not difficult. Processors just kept getting faster. Waiting a year for the customer’s hardware to be upgraded was a valid optimization strategy. Nowadays, however, individual processors don’t get much faster; systems just get more of them.

Much comment has been made on coding paradigms to target multiple-processor cores, but the data-parallel paradigm is a newer approach that may just turn out to be easier to code to, and easier for processor manufacturers to implement.

This article provides a high-level description of data-parallel computing and some practical information on how and where to use it. It also covers data-parallel programming environments, paying particular attention to those based on programmable graphics processors.

A BIT OF BACKGROUND

Although the rate of processor-performance growth seems almost magical, it is gated by fundamental laws of physics. For the entire decade of the ’90s, these laws enabled processors to grow exponentially in performance as a result of improvements in gates-per-die, clock speed, and instruction-level parallelism. Beginning in 2003, though, the laws of physics (power and heat) put an end to growth in clock speed. Then the silicon area requirements

References:

http://www.acmqueue.com

Archives