matter. Table 5 shows S for the experiment on Dhrystone 2. 1 along
with the resulting metric sets Mn for each individual option. They
are listed in the same order that they appear in the ARM C compiler
documentation (alphabetically). Note that power consumption and
build time are not calculated.
Application code size is presented in bytes and reflects the entire
executable, including every generated object file. These files include
the main, startup code, timer functions, library code, and all associated data. Execution times are in seconds, reflecting 60,000 runs of
the Dhrystone 2. 1 modules, and were recorded using the highest precision that the simulator allows. Although some of the differences in
execution times might seem insignificant at first glance, they become
more and more significant over time in real-life applications.
The --apcs=interwork option generates code with ARM/
Thumb inter working and results in a significant improvement on code
size when used with the --thumb option. The --apcs=interwork/
ropi option enables read-only position independent code, but the interworking alone only has an impact at -O0. The --bss_threshold=0
option dictates where global data eight bytes or less is placed in memory, sometimes saving the number of needed base pointer registers to
access that data. The --split_sections option tells the compiler
to generate individual ARM image-defined code sections for every function of the source code, which would only lessen the code size when
used with -O0 [ 2].
Once the creation of S was complete, every possible combination of
the elements in S that included the default or higher level of general
optimization was applied when the application was compiled. In this
test case that was every possible combination of the elements of S that
included the -O2 or -O3 options. The Mn for each of these combinations was recorded. In all practical cases at least the default level of general optimization will be needed to find the best combination of options.
Once the results were recorded, a scale factor needed to be determined for each relevant criterion a, b, c, and d. A scaling factor was not
needed for a criterion whose importance factor was zero. Let us call criterion CR the criterion toward which other criteria were scaled (CR will
always have a scaling factor of one). To calculate the scale factor for any
other criterion N, the average value of N and CR (both rounded to the
nearest precision of the recorded data) was obtained. Then, the average
value of N was divided by the average value of CR. In this experiment,
the criteria scaled toward execution time. The average execution time
(rounded to the same precision as our criterion data) was 6.81108871
seconds and the average application code size was 93,030, resulting in a
scale factor of .00007321 for code size. This scale factor was also
rounded to the same precision as the criterion data. Scale factors for the
other criteria were calculated in the same way using the same CR. In
this experiment there was only one scale factor to calculate.
An overall value or “score” (Vn) was determined for each set Mn
using importance factors and scale factors with the following equation:
Vn = anX1Y1 + bnX2Y2 + cnX3Y3 + dnX4Y4
X1 = scale factor for a
X2 = scale factor for b
X3 = scale factor for c
X4 = scale factor for d
Y1 = importance factor for a
Y2 = importance factor for b
Y3 = importance factor for c
Y4 = importance factor for d
The Vn for each of these combinations that was the smallest in
magnitude (there could have been more than one) represented the Vn
for the most optimal compiler option combination based on the
goals according to the methodology.
Table 6 shows the best ten compiler option combinations for the
experiment based on Vn.
It should be no surprise that the most optimal combinations
depend on the optimization goals. These combinations usually contain a large number of options since the default options alone try to
maintain a balance between the debug view and performing optimization. For this case, using Thumb code had a large impact on
code size, shown by the fact that the --thumb option was a part of
every combination in the top ten.
Because of the heavier emphasis on speed versus code size, the
best possible combination for pure speed ended up being the best
overall combination for producing Vn. Conversely, the best possible
combination for pure code size ranked 41 out of the 80 combination
candidates (the candidates being only those combinations that
included -O2 or -O3).
A high-level flow chart for the presented methodology is shown
in Figure 2.
The only way to know the optimal combination of options is to test
them intelligently based on a set of goals and how important those
goals are in relation to each other. This is precisely what the presented
methodology does. However, its weakness lies in the fact that it is
Compiler Option Combination
--bss_threshold=0 -O3 -Otime --split_sections --thumb --apcs=interwork
-O3 -Otime --split_sections --thumb --apcs=interwork
--apcs=interwork/ropi --bss_threshold=0 -O3 -Otime --split_sections --thumb
--apcs=interwork/ropi -O3 -Otime --split_sections --thumb
--bss_threshold=0 -O3 -Otime --thumb --apcs=interwork
--bss_threshold=0 -O2 -Otime --split_sections --thumb --apcs=interwork
-O3 -Otime --thumb --apcs=interwork
-O2 -Otime --split_sections --thumb --apcs=interwork
--bss_threshold=0 -O2 -Otime --thumb --apcs=interwork
--apcs=interwork/ropi --bss_threshold=0 -O3 -Otime --thumb
Code Size Execution Speed
Table 6: Ten most optimal option combinations for methodology applied to modified Dhrystone 2. 1.