apply similarly on other architectures.
Additionally, I cover only the GCC and
Clang compilers, but equally clever
optimizations show up on compilers
from Microsoft and Intel.
Optimization 101
This is far from a deep dive into compiler optimizations, but some concepts
used by compilers are useful to know.
In these pages, you will note a running
column of examples of scripts and instructions for the processes and operations discussed. All are linked by the
corresponding (letter).
Many optimizations fall under the
umbrella of strength reduction: taking
expensive operations and transforming
them to use less expensive ones. A very
simple example of strength reduction
would be taking a loop with a multiplication involving the loop counter, as shown
in (b). Even on today’s CPUs, multiplication is a little slower than simpler arithmetic, so the compiler will rewrite that
loop to be something like (c).
Here, strength reduction took a loop
involving multiplication and turned
it into a sequence of equivalent operations using only addition. There
are many forms of strength reduction,
more of which show up in the practical
examples given later.
Another key optimization is inlining,
in which the compiler replaces a call to
a function with the body of that function. This removes the overhead of the
call and often unlocks further optimizations, as the compiler can optimize
the combined code as a single unit. You
will see plenty of examples of this later.
Other optimization categories
include:
˲ Constant folding. The compiler
takes expressions whose values can be
sprites and polygons for fast process-
ing of financial data. Just as before,
knowing what the compiler was doing
with code helped inform the way we
wrote the code.
Obviously, nicely written, testable
code is extremely important—
especially if that code has the potential to
make thousands of financial transactions per second. Being fastest is
great, but not having bugs is even
more important.
In 2012, we were debating which of
the new C++ 11 features could be adopted as part of the canon of acceptable coding practices. When every nanosecond
counts, you want to be able to give advice
to programmers about how best to write
their code without being antagonistic
to performance. While experimenting
with how code uses new features such as
auto, lambdas, and range-based for, I
wrote a shell script (a) to run the compiler continuously and show its filtered
output. This proved so useful in answering all these “what if?” questions that
I went home that evening and created
Compiler Explorer. 1
Over the years I have been constantly
amazed by the lengths to which compilers go in order to take our code and
turn it into a work of assembly code
art. I encourage all compiled language
programmers to learn a little assembly
in order to appreciate what their compilers are doing for them. Even if you
cannot write it yourself, being able to
read it is a useful skill.
All the assembly code shown here
is for 64-bit x86 processors, as that is
the CPU I’m most familiar with and
is one of the most common server
architectures. Some of the examples
shown here are x86-specific, but in
reality, many types of optimizations
Many optimizations
fall under the
umbrella of
strength reduction:
taking expensive
operations and
transforming
them to use less
expensive ones.