Compiled with –Ospace, –O3
Compiled with –Ospace, –O3, --thumb
Execution Time
(microseconds)
2.467
3.050
Program Size*
(bytes)
134
104
bar
main
*Program size reflects object file that includes main and all associated data,
but not startup code.
B
STR
BL
MOV
LDR
BX
foo
r14, [r13, #-0x0004]!
bar
r0, #0x00000000
r14, [r13], #0x0004
r14
Table 3: Checksum simulation comparisons using ––thumb.
Interprocedural Optimizations
Interprocedural optimizations [ 10] differ from other compiler optimizations in that they are performed by analyzing the entire program
or file as opposed to individual functions or blocks of code. One
example that we have already mentioned is inline function expansion
[ 8]. Other general examples include removing non-executing and/or
redundant code (also known as “dead code elimination”), simplifying
loops, and using memory more efficiently.
Many interprocedural optimizations have to do with optimizing
function calls. A function call can be considered to be a “tail-call” if
it is the last meaningful thing done by the calling function (other than
the function’s return instruction). Tail-call optimization can eliminate unnecessary returns and stack accesses in function hierarchies.
Consider the following C code:
int foo() {
return 1;}int bar() {
int x = foo();
return x; }The first noticeable difference between the outputs is that the
compiler will sometimes do some insignificant and wasteful things at
-O0 according to the procedure call standard mentioned before. Understanding this is not important here, but more information can be
found by exploring the standard [ 1]. The tail-call optimization attempts to replace branches that require return address overhead with
direct branches that do not need to return. Examining the first output shows that the call to foo in bar requires that the return address
register (r14) be first pushed on the stack so that the BL instruction
can safely overwrite it. This is not needed with a direct branch, which
can be used because there is no need to return to bar from foo. All
of this save stack and code space and decrease execution time. Also of
note are the stack instructions using r13 (which is the register defined
by the procedure call standard as the stack pointer) and the “!” signifier. The “!” indicates that the stack pointer will be updated with the
new value used to calculate the address accessed.
The results of the simulation can be seen in Table 4.
The function foo still returns normally so that other functions
can call it and return correctly. The transformations done by the tail-call optimization only apply to the calling function. Again, the other
differences seen are the result of the compiler adhering to the procedure call standard.
int main() {
int y = bar();
return 0; }
Tail-call optimization is only done by the ARM C compiler when
using -O1 or higher. When compiling this with -O0, the compiler
produces the following assembly:
foo
bar
main
MOV
BX
STR
BL
MOV
MOV
LDR
BX
STR
BL
MOV
MOV
LDR
BX
r0, #0x0000001
r14
r14, [r13, #-0x0004]!
foo
r1, r0
r0, r1
r14, [r13], #0x0004
r14
r14, [r13, #-0x0004]!
bar
r2, r0
r0, #0x00000000
r14, [r13], #0x0004
r14
Compare this with the output using -O1:
foo
MOV
BX
r0, #0x0000001
r14
Compiled with –O0
Compiled with –O1
Execution Time
(microseconds)
.217
.050
Program Size*
(bytes)
56
32
*Program size reflects object file that includes main and all associated
data, but not startup code.
Table 4: Bar simulation comparisons.
A Goal-Oriented Methodology
for Using Compiler Options
It is important to find the best combination of options for a specific
application during the development process. The following methodology is one way to do this and similar approaches can be taken. The
first step is to analyze the goals of the option selection process and
decide which criteria are most important to meet these goals. Some
criteria include:
• Execution Speed
Usually measured in MIPS or microseconds, it is the speed that
the processor can execute instructions and is dependent upon a
number of hardware and software factors.
• Application Code Size
The application code size is the amount of memory required to
hold the entire application code, including all associated data and