table 1: object-oriented complexity
metrics (per binary); from an internal
microsoft Research document by
murphy, B. and nagappan, n.
characterizing Vista Development,
December 15, 2006.
Vista/Win 2003
(mean per binary)
1. 45
total
functions
max class 1. 22
methods
total class 1. 59
methods
max inheritance 1. 33
Depth
total inheritance 1. 54
Depth
max 3. 87
subclasses
total 2. 27
subclasses
dication is that array bounds and null
pointer checks impose a time overhead
of approximately 4.5% in the Singularity OS. 1 Also important, and equally difficult to measure, are the performance
consequences of improved software-engineering practices (such as layering
software architecture and modulariz-ing systems to improve development
and allow subsets).
Meanwhile, the data manipulated by
computers is also evolving, from simple
ASCII text to larger, structured objects
(such as Word and Excel documents), to
compressed documents (such as JPEG
images), and more recently to space-and computation-inefficient formats
(such as XML). The growing popularity
of video introduces yet another format
that is even more computationally expensive to manipulate.
Programming changes. Over the
past 30 years, programming languages
have evolved from assembly language
and C code to increased use of higher-
level languages. A major step was C++,
which introduced object-oriented
mechanisms (such as virtual-method
dispatch). C++ also introduced abstraction mechanisms (such as classes and
templates) that made possible rich libraries (such as the Standard Template
Library). These language mechanisms
required non-trivial, opaque runtime
implementations that could be expensive to execute but improved software
development through modularity, information hiding, and increased code
reuse. In turn, these practices enabled
the construction of ever-larger and
more complex software.
Table 1 compares several key object-oriented complexity metrics between
Windows 2003 and Vista, showing increased use of object-oriented features.
For example, the number of classes per
binary component increased 59% and
the number of subclasses per binary
127% between the two systems.
These changes could have performance consequences. Comparing the
SPEC CPU2000 and CPU2006 benchmarks, Kejariwal et al. attributed the
12
lower performance of the newer suite
to increased complexity and size due
to the inclusion of six new C++ benchmarks and enhancements to existing
programs.
Safe, managed languages (such as
C# and Java) further increased the
level of programming by introducing
garbage collection, richer class libraries (such as .NET and the Java Class
Library), just-in-time compilation, and
runtime reflection. All these features
provide powerful abstractions for developing software but also consume
memory and processor resources in
nonobvious ways.
Language features can affect performance in two ways: The first is that
a mechanism can be costly, even when
not being used. Program reflection, a
well-known example of a costly language
feature, requires a runtime system to
maintain a large amount of metadata
on every method and class, even if the
reflection features are not invoked. The
second is that high-level languages hide
details of a machine beneath a more abstract programming model. This leaves
developers less aware of performance
considerations and less able to understand and correct problems.
Mitchell et al. 16 analyzed the conversion of a date object in SOAP format to a
Java Date object in IBM’s Trade benchmark, a sample business application
built on IBM Websphere. The conversion entailed 268 method calls and
allocation of 70 objects. Jann et al. 11
analyzed this benchmark on consecutive implementations of IBM’s POWER
architecture, observing that “modern
e-commerce applications are increasingly built out of easy-to-program, generalized but nonoptimized software
components, resulting in substantive
stress on the memory and storage subsystems of the computer.”
I conducted simple programming experiments to compare the cost of implementing the archetypical Hello World
program using various languages and
features. Table 2 compares C and C# versions of the program, showing the latter
has a working set 4. 7–5. 2 times larger.
Another experiment measured the cost
of displaying the string “Hello World”
by both writing it to a console window
and displaying it in a pop-up window.
Table 3 shows that a dialog box is 20. 7
times computationally more costly in
C++ (using Microsoft Foundation Class)
and 30. 6 times more costly in C# (using
Windows Forms). By comparison, the
choice of language and runtime system
made relatively little difference, as C#
was only 1. 5 times more costly than C++
for the console and 2. 2 times more costly with a window.
This disparity is not a criticism
of C#, .NET, or window systems; the
table 2: hello World benchmark running on intel x86,
Vista enterprise, and Visual studio 2008.
table 3: execution cost of displaying
“hello World” string.
Language
c
c++
Debug Build optimized Build
Working set startup Bytes Working set startup Bytes
1,424K 6,162 1,304K 5,874
6,756K 113,280 6,748K 87, 62
mechanism
c++, console
c++, window
c#, console
c#, window
timer cycles
(280ns)
1,760
36,375
2.628
80,348