calculated at compile time and replaces them with the result of the calculation directly.
˲ Constant propagation. The compiler tracks the provenance of values
and takes advantage of knowing that
certain values are constant for all possible executions.
˲ Common subexpression elimination. Duplicated calculations are rewritten to calculate once and duplicate the result.
˲ Dead code removal. After many of
the other optimizations, there may be
areas of the code that have no effect on
the output, and these can be removed.
This includes loads and stores whose
values are unused, as well as entire
functions and expressions.
˲ Instruction selection. This is not an
optimization as such, but as the compiler takes its internal representation
of the program and generates CPU
instructions, it usually has a large set
of equivalent instruction sequences
from which to choose. Making the
right choice requires the compiler to
know a lot about the architecture of
the processor it’s targeting.
˲ Loop invariant code movement. The
compiler recognizes that some expressions within a loop are constant for the
duration of that loop and moves them
outside of the loop. On top of this,
the compiler is able to move a loop
invariant conditional check out of a
loop, and then duplicate the loop body
twice: once if the condition is true, and
once if it is false. This can lead to further optimizations.
˲ Peephole optimizations. The compiler takes short sequences of instructions and looks for local optimizations
between those instructions.
˲ Tail call removal. A recursive function that ends in a call to itself can often be rewritten as a loop, reducing call
overhead and reducing the chance of
stack overflow.
The golden rule for helping the
compiler optimize is to ensure it has
as much information as possible to
make the right optimization deci-
sions. One source of information is
your code: If the compiler can see
more of your code, it’s able to make
better decisions. Another source of
information is the compiler flags you
use: telling your compiler the exact
CPU architecture you are targeting
can make a big difference. Of course,
the more information a compiler
has, the longer it could take to run, so
there is a balance to be struck here.
Let’s take a look at an example (d),
counting the number of elements of a
vector that pass some test (compiled
with GCC, optimization level 3; https://
godbolt.org/z/acm19_count1). If the
compiler has no information about
testFunc, it will generate an inner
loop like(e).
To understand this code, it’s useful
to know that a std::vector<> con-
tains some pointers: one to the begin-
ning of the data; one to the end of the
data; and one to the end of the stor-
age currently allocated (f). The size
of the vector is not directly stored, it’s
implied in the difference between the
begin() and end() pointers. Note
that the calls to vector<>::size()
and vector<>::operator[] have
been inlined completely.
In the assembly code (e), ebp points
to the vector object, and the begin()
and end() pointers are therefore
QWORD PTR [rbp+0]and QWORD PTR
[rbp+ 8], respectively.
Another neat trick the compiler has
done is to remove any branching: you
might reasonably expect if (test-
Func(...)) would turn into a compari-
son and branch. Here the compiler
does the comparison cmp al, 1,
which sets the processor carry flag if
testFunc() returned false, otherwise
it clears it. The sbb r12d, - 1 instruc-
tion then subtracts - 1 with borrow, the
subtract equivalent of carrying, which
also uses the carry flag. This has the de-
sired side effect: If the carry is clear
(testFunc() returned true), it sub-
tracts – 1, which is the same as adding
1. If the carry is set, it subtracts – 1 + 1,
which has no effect on the value. Avoid-
ing branches can be advantageous in
some cases if the branch is not easily
predictable by the processor.
It may seem surprising that the com-
piler reloads the begin() and end()
pointers each loop iteration, and in-
deed it rederives size() each time
too. However, the compiler is forced to
do so: it has no idea what testFunc()
does and must assume the worst.
That is, it must assume calls to test-
Func() may cause the vec to be modi-
fied. The const reference here doesn’t
allow any additional optimizations for
a couple of reasons: testFunc() may
have a non-const reference to vec
(perhaps through a global variable), or
testFunc() might cast away const.
If, however, the compiler can see
the body of testFunc(), and from
this know that it does not in fact
modify vec (g), the story is very different