Article development led by
The increasing significance of intermediate
representations in compilers.
BY fReD choW
ProGram ComPilatioN is a complicated process.
A compiler is a software program that translates a
high-level source-language program into a form ready
to execute on a computer. Early on in the evolution
of compilers, designers introduced intermediate
representations (IRs, also commonly called intermediate
languages) to manage the complexity
of the compilation process. The use of
an IR as the compiler’s internal representation of the program enables the
compiler to be broken up into multiple phases and components, thus
benefiting from modularity.
An IR is any data structure that can
represent the program without loss of
information so that its execution can
be conducted accurately. It serves as
the common interface among the
compiler components. Since its use is
internal to a compiler, each compiler
is free to define the form and details
of its IR, and its specification needs to
be known only to the compiler writers.
Its existence can be transient during
the compilation process, or it can be
output and handled as text or binary
files. An IR should be general so that
it is capable of representing programs
translated from multiple languages.
Compiler writers traditionally refer
to the semantic content of program-
ming languages as being high. The
semantic content of machine-execut-
able code is considered low because it
has retained only enough information
from the original program to allow its
correct execution. It would be diffi-
cult (if not impossible) to recreate the
source program from its lower form.
The compilation process entails the
gradual lowering of the program rep-
resentation from high-level human
programming constructs to low-level
real or virtual machine instructions
(see Figure 1). In order for an IR to be
capable of representing multiple lan-
guages, it needs to be closer to the ma-
chine level to represent the execution
behavior of all the languages. A longer
code sequence usually accompanies
machine-executable code because it
reflects the details of the machines on
which execution takes place.
A well-designed IR should be translatable into different forms for execution on multiple platforms. For execution on a target processor or CPU, it
needs to be translated into the assembly language of that processor, which
usually is a one-to-one mapping to