practice
Heterogeneous
Computing:
Here to Stay
MENTIONS OF THE phrase heterogeneous computing
have been on the rise in the past few years and will
continue to be heard for years to come, because
heterogeneous computing is here to stay. What is
heterogeneous computing, and why is it becoming
the norm? How do we deal with it, from both the
software side and the hardware side? This article
provides answers to some of these questions and
presents different points of view on others.
Let’s start with the easy questions. What is
heterogeneous computing? In a nutshell, it is a
scheme in which the different computing nodes
have different capabilities and/or different ways of
executing instructions. A heterogeneous system is
therefore a parallel system (single-core systems are
almost ancient history). When multicore systems
appeared, they were homogeneous—that is, all cores
were similar. Moving from sequential programming
to parallel programming, which used to be an area
only for niche programmers, was a big jump. In
heterogeneous computing, the cores are different.
Cores can have the same architectural capabilities—for example, the
same hyperthreading capacity (or lack
thereof), same superscalar width, vector arithmetic, and so on. Even cores
that are similar in those capabilities,
however, have some kind of heterogeneity. This is because each core now
has its own DVFS (dynamic voltage
and frequency scaling). A core that is
doing more work will be warmer and
hence will reduce its frequency and
become, well, slower. Therefore, even
cores with the same specifications can
be heterogeneous. This is the first type
of heterogeneity.
The second type involves cores with
different architectural capabilities.
One example is a processor with several simple cores (for example, single-issue, no out-of-order execution, no
speculative execution), together with a
few fat cores for example, with hyperthreading technology, wide superscalar cores with out-of-order and speculative execution).
These first two types of heterogeneity
involve cores with the same execution
model of sequential programming—
that is, each core appears to execute instructions in sequence even if under the
hood there is some kind of parallelism
among instructions. With this multicore
machine, you may write parallel code,
but each thread (or process) is executed
by the core in a seemingly sequential
manner. What if computing nodes are
included that don’t work like that? This
is the third type of heterogeneity.
In this type of heterogeneity the
computing nodes have different execution models. Several different types of
nodes exist here. The most famous is
the GPU (graphics processing unit),
now used in many different applications beside graphics. For example,
GPUs are used a lot in deep learning,
especially the training part. They are
also used in many scientific applications and are delivering performance
that is orders of magnitude better than
traditional cores. The reason for this
performance boost is that a GPU uses
the single-instruction (or thread), mul-
DOI: 10.1145/3024918
Article development led by
queue.acm.org
Hardware and software perspectives.
BY MOHAMED ZAHRAN