lowing conversation with a potential
customer:
“How do we run your tool?”
“Just type ‘make’ and we’ll rewrite
its output.”
“What’s ‘make’? We use ClearCase.”
“Uh, What’s ClearCase?”
This turned out to be a chasm we
couldn’t cross. (Strictly speaking, the
customer used ‘ClearMake,’ but the
superficial similarities in name are en-
tirely unhelpful at the technical level.)
We skipped that company and went
to a few others. They exposed other
problems with our method, which we
papered over with 90% hacks. None
seemed so troublesome as to force us
to rethink the approach—at least until
we got the following support call from
a large customer:
“Why is it when I run your tool, I
have to reinstall my Linux distribution
from CD?”
This was indeed a puzzling ques-
tion. Some poking around exposed the
following chain of events: the compa-
ny’s make used a novel format to print
out the absolute path of the directory
in which the compiler ran; our script
misparsed this path, producing the
empty string that we gave as the desti-
nation to the Unix “cd” (change direc-
tory) command, causing it to change
to the top level of the system; it ran
“rm -rf *” (recursive delete) during
compilation to clean up temporary
files; and the build process ran as root.
Summing these points produces the
removal of all files on the system.
The right approach, which we have
used for the past seven years, kicks off
the build process and intercepts every
system call it invokes. As a result, we can
see everything needed for checking, including the exact executables invoked,
their command lines, the directory
they run in, and the version of the compiler (needed for compiler-bug workarounds). This control makes it easy to
grab and precisely check all source code,
to the extent of automatically changing
the language dialect on a per-file basis.
To invoke our tool users need only
call it with their build command as an
argument:
A misunderstood
explanation
means the error is
ignored or, worse,
transmuted into
a false positive.
cov-build <build command>
We thought this approach was bulletproof. Unfortunately, as the astute read-
er has noted, it requires a command
prompt. Soon after implementing it we
went to a large company, so large it had
a hyperspecialized build engineer, who
engaged in the following dialogue:
“How do I run your tool?”
“Oh, it’s easy. Just type ‘cov-build’
before your build command.”
“Build command? I just push this
[GUI] button...”
Social vs. technical. The social restriction that you cannot change anything,
no matter how broken it may be, forces
ugly workarounds. A representative example is: Build interposition on Windows requires running the compiler in
the debugger. Unfortunately, doing so
causes a very popular windows C++ compiler—Visual Studio C++ .NET 2003—to
prematurely exit with a bizarre error
message. After some high-stress fussing, it turns out that the compiler has a
use-after-free bug, hit when code used a
Microsoft-specific C language extension
(certain invocations of its #using directive). The compiler runs fine in normal
use; when it reads the freed memory,
the original contents are still there, so
everything works. However, when run
with the debugger, the compiler switch-es to using a “debug malloc,” which on
each free call sets the freed memory
contents to a garbage value. The subsequent read returns this value, and the
compiler blows up with a fatal error.
The sufficiently perverse reader can no
doubt guess the “solution.”a
Law: You can’t check code you can’t
parse. Checking code deeply requires
understanding the code’s semantics.
The most basic requirement is that you
parse it. Parsing is considered a solved
problem. Unfortunately, this view is naïve, rooted in the widely believed myth
that programming languages exist.
The C language does not exist; neither does Java, C++, and C#. While a
language may exist as an abstract idea,
and even have a pile of paper (a standard) purporting to define it, a standard is not a compiler. What language
do people write code in? The character
strings accepted by their compiler.
Further, they equate compilation with
certification. A file their compiler does
a Immediately after process startup our tool
writes 0 to the memory location of the “in debugger” variable that the compiler checks to
decide whether to use the debug malloc.