I
M
A
G
E
B
Y
A
L
I
C
I
A
K
U
B
I
S
T
A
/
A
N
D
R
I
J
B
O
R
Y
S
A
S
S
O
C
I
A
T
E
S
tiple-data execution model. Let’s as-
sume you have a large matrix and need
to multiply each element of this matrix
by a constant. With a traditional core,
this is done one element at a time or, at
most, a few elements at a time. With a
GPU, you can multiply all the elements
at once, or in a very few iterations if the
matrix is very large. The GPU excels in
similar independent operations on
large amounts of data.
Another computing paradigm that
deviates from the traditional sequen-
tial scheme is the FPGA (field-pro-
grammable gate array). We all know
that software and hardware are logi-
cally equivalent, meaning what you
can do with software you can also do
with hardware. Hardware solutions are
much faster but inflexible. The FPGA
tries to close this gap. It is a circuit that
can be configured by the programmer
to implement a certain function. Sup-
pose you need to calculate a polyno-
mial function on a group of elements.
A single polynomial function is com-
piled to tens of assembly instructions.
A FPGA is a good choice if the number
of elements needed to calculate the
function is not large enough to require
a GPU, and not small enough to be
done in a traditional core efficiently.
FPGAs have been used in many high-
performance clusters. With Intel’s ac-
quisition last year of Altera, one of the
big players in the FPGA market, tighter
integration of FPGAs and traditional
cores is expected. Also, Microsoft has
started using FPGAs in its datacenter
(Project Catapult).
A new member recently added to the
computing-node options is the AP (Au-
tomata processor) from Micron.
3 AP is
very well suited for graph analysis, pat-
tern matching, data analytics, and sta-
tistics. Think of it as a hardware regu-
lar expressions accelerator that works
in parallel. If you can formulate the
problem at hand as a regular expres-
sion, then you can expect to get much
higher performance than a GPU could
provide. AP is built using FPGAs but
designed to be more efficient in regu-
lar expressions processing.
Aside from the aforementioned
computing nodes, there are many other processing nodes such as the DSP
(digital signal processor) and ASIC (
ap-plication-specific integrated circuit).
Those target small niches of applica-