Abstract Optimizing embedded applications using a compiler can generally be broken down into two major categories: hand-optimizing code to take advantage of a particular processor’s compiler and applying built-in optimization options to proven and well-polished code. The former is well documented for different processors, but little has been done to find generalized methods for optimal sets of compiler options based on common goal criteria such as application code size, execution speed, power consumption, and build time. This article discusses the fundamental differences between these two general categories of optimizations using the compiler. Examples of common, built-in compiler options are presented using a simulated ARM processor and C compiler, along with a simple methodology that can be applied to any embedded compiler for finding an optimal set of compiler options.
Even with the advent of the first general-purpose computers in the 1940s, a need arose for machines designed to perform a few dedicated tasks in real-time. This need gave birth to the world’s first embedded systems. In 1961, Charles Stark Draper developed The Apollo Guidance Computer at the MIT Instrumentation Lab, which generally is recognized as the first modern embedded computer system [ 6]. Today’s definition of an embedded system would include the use of a microprocessor, which became commercially feasible around 1971 [ 4], allowing smaller computers to aid in making a phone call, performing surgery, or playing a game.
As embedded processor architectures have become more complicated, programmers have become more dependent on the compiler’s knowledge of the processor’s the instruction sets, pipelines, and complex memory systems. It is a common misconception that faster, more complex processors diminish the need for better compilers. Compiler technology must necessarily advance to take advantage of new processor features. Using a good compiler optimally could not only make code smaller and faster, but also bring financial gains during development in three basic ways:
By decreasing code size, less memory ultimately will be needed by the system. Although memory cost has gotten significantly cheaper over time, it still remains one of the most expensive components of an embedded system.
Increasing the performance of the software enables engineers to use slower, more cost-efficient processors.
As compilers become more powerful, time-consuming hand optimization has become less important. Also, it has become essential to write readable, modularly structured, and maintainable code for the purpose of portability and reuse.
Let us first explore the differences between compiling (and writing) hand-optimized, high-level code for a specific architecture and using the built-in compiler optimizations that can be applied to arbitrary pieces of code. While this article touches on the differences between these two general classes of optimization techniques, it focuses mainly on the latter. Built-in compiler optimizations include one or more of the following: using options at the command-line and front-end of the tools; compiler-recognized keywords used at the source level (for example, using __packed might tell a compiler to reorder some declared global data in memory to eliminate padding) and automatic optimizations that the compiler will always perform. The options (also known as switches) can be applied to any piece of code, whereas the targeted, hand-optimization techniques only apply to certain aspects of code. Of course, the compiler output could vary depending on the code and switch settings, but it is possible to get the same compiler output from basic pieces of code using different architectural- and processor-specific compiler switches. Additionally, a methodology for using and selecting these built-in switches will be presented. Think of this article as more of a guide to optimizing polished source code using only compiler options (which are one step of an entire optimization process) rather than a guide to writing optimized high-level code for a specific processor.
First, it is worth having a brief look at the idea of writing code to run on a specific processor or architecture. This should not be confused with using built-in architectural- and processor-specific optimization switches that are shown later.
Hand Optimizing High-Level, Targeted Code One simple example of writing ARM-targeted C can be found in coding loops. In C it is common practice to implement a simple for-loop incrementing a counter as:
int countUp() {
unsigned int i;
int sum = 0;References:
Archives