Compiled with –Ospace, O3

Compiled with –Ospace, O3, --thumb

Execution Time
(microseconds)
2.467
3.050

Program Size*

(bytes)

134

104

bar
main

*Program size reflects object file that includes main and all associated data, but not startup code.

B
STR
BL
MOV
LDR
BX
foo
r14, [r13, #-0x0004]!
bar
r0, #0x00000000
r14, [r13], #0x0004
r14

Table 3: Checksum simulation comparisons using ––thumb.

Interprocedural Optimizations

Interprocedural optimizations [ 10] differ from other compiler optimizations in that they are performed by analyzing the entire program or file as opposed to individual functions or blocks of code. One example that we have already mentioned is inline function expansion [ 8]. Other general examples include removing non-executing and/or redundant code (also known as “dead code elimination”), simplifying loops, and using memory more efficiently.

Many interprocedural optimizations have to do with optimizing function calls. A function call can be considered to be a “tail-call” if it is the last meaningful thing done by the calling function (other than the function’s return instruction). Tail-call optimization can eliminate unnecessary returns and stack accesses in function hierarchies. Consider the following C code:

int foo() {
  return 1;}
int bar() {
  int x = foo();
  return x; }

The first noticeable difference between the outputs is that the compiler will sometimes do some insignificant and wasteful things at -O0 according to the procedure call standard mentioned before. Understanding this is not important here, but more information can be found by exploring the standard [ 1]. The tail-call optimization attempts to replace branches that require return address overhead with direct branches that do not need to return. Examining the first output shows that the call to foo in bar requires that the return address register (r14) be first pushed on the stack so that the BL instruction can safely overwrite it. This is not needed with a direct branch, which can be used because there is no need to return to bar from foo. All of this save stack and code space and decrease execution time. Also of note are the stack instructions using r13 (which is the register defined by the procedure call standard as the stack pointer) and the “!” signifier. The “!” indicates that the stack pointer will be updated with the new value used to calculate the address accessed.

The results of the simulation can be seen in Table 4.

The function foo still returns normally so that other functions can call it and return correctly. The transformations done by the tail-call optimization only apply to the calling function. Again, the other differences seen are the result of the compiler adhering to the procedure call standard.

int main() {
  int y = bar();
  return 0; }

 

Tail-call optimization is only done by the ARM C compiler when using -O1 or higher. When compiling this with -O0, the compiler produces the following assembly:

foo
bar
main
MOV
BX
STR
BL
MOV
MOV
LDR
BX
STR
BL
MOV
MOV
LDR
BX
r0, #0x0000001
r14
r14, [r13, #-0x0004]!
foo
r1, r0
r0, r1
r14, [r13], #0x0004
r14
r14, [r13, #-0x0004]!
bar
r2, r0
r0, #0x00000000
r14, [r13], #0x0004
r14

 

Compare this with the output using -O1:

foo
MOV
BX
r0, #0x0000001
r14

Compiled with O0 Compiled with O1

Execution Time
(microseconds)
.217
.050

Program Size*

(bytes)

56 32

*Program size reflects object file that includes main and all associated data, but not startup code.

Table 4: Bar simulation comparisons.

A Goal-Oriented Methodology
for Using Compiler Options

It is important to find the best combination of options for a specific application during the development process. The following methodology is one way to do this and similar approaches can be taken. The first step is to analyze the goals of the option selection process and decide which criteria are most important to meet these goals. Some criteria include:

Execution Speed

Usually measured in MIPS or microseconds, it is the speed that the processor can execute instructions and is dependent upon a number of hardware and software factors.

Application Code Size

The application code size is the amount of memory required to hold the entire application code, including all associated data and

References:

http://www.acm.org/crossroads

Archives