matter. Table 5 shows S for the experiment on Dhrystone 2. 1 along with the resulting metric sets Mn for each individual option. They are listed in the same order that they appear in the ARM C compiler documentation (alphabetically). Note that power consumption and build time are not calculated.
Application code size is presented in bytes and reflects the entire executable, including every generated object file. These files include the main, startup code, timer functions, library code, and all associated data. Execution times are in seconds, reflecting 60,000 runs of the Dhrystone 2. 1 modules, and were recorded using the highest precision that the simulator allows. Although some of the differences in execution times might seem insignificant at first glance, they become more and more significant over time in real-life applications.
The --apcs=interwork option generates code with ARM/ Thumb inter working and results in a significant improvement on code size when used with the --thumb option. The --apcs=interwork/ ropi option enables read-only position independent code, but the interworking alone only has an impact at -O0. The --bss_threshold=0 option dictates where global data eight bytes or less is placed in memory, sometimes saving the number of needed base pointer registers to access that data. The --split_sections option tells the compiler to generate individual ARM image-defined code sections for every function of the source code, which would only lessen the code size when used with -O0 [ 2].
Once the creation of S was complete, every possible combination of the elements in S that included the default or higher level of general optimization was applied when the application was compiled. In this test case that was every possible combination of the elements of S that included the -O2 or -O3 options. The Mn for each of these combinations was recorded. In all practical cases at least the default level of general optimization will be needed to find the best combination of options.
Once the results were recorded, a scale factor needed to be determined for each relevant criterion a, b, c, and d. A scaling factor was not needed for a criterion whose importance factor was zero. Let us call criterion CR the criterion toward which other criteria were scaled (CR will always have a scaling factor of one). To calculate the scale factor for any other criterion N, the average value of N and CR (both rounded to the nearest precision of the recorded data) was obtained. Then, the average value of N was divided by the average value of CR. In this experiment, the criteria scaled toward execution time. The average execution time (rounded to the same precision as our criterion data) was 6.81108871 seconds and the average application code size was 93,030, resulting in a scale factor of .00007321 for code size. This scale factor was also rounded to the same precision as the criterion data. Scale factors for the
other criteria were calculated in the same way using the same CR. In this experiment there was only one scale factor to calculate.
An overall value or “score” (Vn) was determined for each set Mn using importance factors and scale factors with the following equation:
X1 = scale factor for a
X2 = scale factor for b
X3 = scale factor for c
X4 = scale factor for d
Y1 = importance factor for a
Y2 = importance factor for b
Y3 = importance factor for c
Y4 = importance factor for d
The Vn for each of these combinations that was the smallest in magnitude (there could have been more than one) represented the Vn for the most optimal compiler option combination based on the goals according to the methodology.
Table 6 shows the best ten compiler option combinations for the experiment based on Vn.
It should be no surprise that the most optimal combinations depend on the optimization goals. These combinations usually contain a large number of options since the default options alone try to maintain a balance between the debug view and performing optimization. For this case, using Thumb code had a large impact on code size, shown by the fact that the --thumb option was a part of every combination in the top ten.
Because of the heavier emphasis on speed versus code size, the best possible combination for pure speed ended up being the best overall combination for producing Vn. Conversely, the best possible combination for pure code size ranked 41 out of the 80 combination candidates (the candidates being only those combinations that included -O2 or -O3).
A high-level flow chart for the presented methodology is shown in Figure 2.
The only way to know the optimal combination of options is to test them intelligently based on a set of goals and how important those goals are in relation to each other. This is precisely what the presented methodology does. However, its weakness lies in the fact that it is
Compiler Option Combination
--bss_threshold=0 -O3 -Otime --split_sections --thumb --apcs=interwork -O3 -Otime --split_sections --thumb --apcs=interwork --apcs=interwork/ropi --bss_threshold=0 -O3 -Otime --split_sections --thumb --apcs=interwork/ropi -O3 -Otime --split_sections --thumb --bss_threshold=0 -O3 -Otime --thumb --apcs=interwork --bss_threshold=0 -O2 -Otime --split_sections --thumb --apcs=interwork -O3 -Otime --thumb --apcs=interwork -O2 -Otime --split_sections --thumb --apcs=interwork --bss_threshold=0 -O2 -Otime --thumb --apcs=interwork --apcs=interwork/ropi --bss_threshold=0 -O3 -Otime --thumb
Code Size Execution Speed (a) (b)
92,224 5.29249510 92,280 5.29250382 92,400 5.29249525 92,456 5.29250337 92,472 5.29249615 92,232 5.34799947 92,528 5.29250487 92,288 5.34800820 92,360 5.34800052 92,648 5.29249600
Vn
6.38691306 6.38999006 6.39657681 6.39965366 6.40053038 6.40122841 6.40360738 6.40430541 6.40825683 6.41019406
Table 6: Ten most optimal option combinations for methodology applied to modified Dhrystone 2. 1.
References:
Archives