vantage of the available system architecture and features, especially SIMD
units, can be very challenging for the
developer. Such features are often disregarded in favor of costly but scalable
measures to increase performance,
mainly the addition of processing power. Virtual machines and interpreted
languages have consistently abstracted
and moved away from the underlying
system architecture.
IllustratIon by andrew clark
SIMD instructions were added to
modern processors to increase instruction-level parallelism. In a SIMD
instruction, multiple and unique data
elements can be manipulated with
one common operation. SIMD units
typically include an additional register
bank with a large register width. These
individual registers can be subdivided
to hold multiple data elements of varying data types.
Developers have begun to take note
that SIMD instruction capabilities are
underutilized in the CPU.
4 In interpreted languages, a portion of the software-to-hardware mapping occurs on the fly
and in real time, leaving little time for
thorough instruction-level parallelism analysis and optimization. Bytecode compilation can be used to identify parallelism opportunities but has
proven to be ineffective in many realistic scenarios. Exhaustive automated
SIMD identification and vectorization
is too computationally intensive to occur within JIT (just-in-time) compilers.
It is now common to find vector
operations on arrays being performed
in native languages with the help of
SIMD-based instruction-set exten-
sions (for example, AltiVec, SSE, VIS),
general-purpose computing on graph-
ics cards (for example, Nvidia CUDA,
ATI STREAM), and memory modules.
AltiVec, SSE, and VIS are well-known
SIMD instruction sets. AltiVec is com-
monly found on PowerPC systems, SSE
on x86-64, and VIS within SPARC.