when no SIMD hardware is detected:
the status quo.
Resource utilization is a key tenet
of the cloud-computing paradigm that
is now attracting many companies.
Maximizing the utilization potential of
cloud resources is vital to lowering cost
and increasing performance. Because
of their popularity and interoperability, interpreted languages, mainly Java,
are frequently chosen for cloud computing. Interpreted languages within
cloud environments do not expose
processor architecture features to the
programmer, and as a result, the generated code is rarely optimized for the
target platform processor resources.
More efficient use of existing data-center resources would surely result in
cost reduction.
Interpreted languages such as Java,
Flash, PHP, and JavaScript may all
benefit from such an approach. There
are many potential arguments for and
against SIMD intrinsics in interpreted
languages, but it is clear that change
is needed.
existing solutions
As the number of cores on modern
processors grows ever higher, so do
the number of available SIMD units.
Most native compilers allow for explicit declaration of SIMD instructions in the form of intrinsics, while
others go even further by extending
the compiler front end with vector
pragmas.
5 This closes the gap between
the application and the underlying
architecture. One study has shown
it is possible to create a common set
of macros for calling many different
SIMD units, including MMX, SSE,
AltiVec, and TriMedia.
13 This set of
macros, called MMM, provides an abstraction for SIMD unit programming
that makes programs more portable
across hardware platforms. Although
this initial approach did provide code
portability, it did not address interpreted languages. A similar approach
is also available at the compiler level.
10
Whereas some compilers can selectively specialize code at compile time
using late binding,
3 it is simpler to
let the user make parallelism explicit
within the code, resolving the parallelism at runtime.
Interpreted languages do not ex-
pose vector functionality to the pro-
grammer in a transparent way. This
is becoming an important problem,
since high-performance computations
are increasingly being carried out us-
ing languages such as Java rather than
Fortran.
2 One solution addressing this
performance gap for interpreted lan-
guages is AMD’s Aparapi,
1 which pro-
vides thread-level parallelism and ac-
cess to video-card acceleration but no
SIMD support. Intel has also exposed
some native SIMD support in its recent
Intel Math Kernel Library in the form
of a library package.
6
arguments for Change
The arguments for supporting the inclusion of premapped vector intrinsics within interpreted languages are
faster application runtime, lower cost,
smaller code size, fewer coding errors,
and a more transparent programming
experience.
Hardware continues to change, but
interpreted languages are not keeping pace with these changes, often relying solely on the default data path.
The integration of SIMD instructions
into virtual machines and their matching language specifications allows
for a larger utilization of available resources. Direct vector operation mappings will yield higher utilization of
the SIMD unit, and therefore the code
will run faster. Multiprocessors today
and in the future will contain a SIMD
hardware core for each processor, magnifying the disparity between sequential and parallel code. For example, in
Intel’s Core i7-920, an x86-64 processor with four physical cores, each core
contains an SSE 4. 2 instruction-set extension. A virtual machine can take advantage of four SSE units at once, out-