Doi: 10.1145/1995376.1995400
Abstracting Abstract Machines
A Systematic Approach to Higher-Order Program Analysis
By David Van Horn and Matthew Might
abstract
predictive models are fundamental to engineering reliable software systems. however, designing conservative,
computable approximations for the behavior of programs
(static analyses) remains a difficult and error-prone process
for modern high-level programming languages. What analysis designers need is a principled method for navigating
the gap between semantics and analytic models: analysis
designers need a method that tames the interaction of complex languages features such as higher-order functions,
recursion, exceptions, continuations, objects and dynamic
allocation.
We contribute a systematic approach to program analysis that yields novel and transparently sound static analyses. Our approach relies on existing derivational techniques
to transform high-level language semantics into low-level
deterministic state-transition systems (with potentially
infinite state spaces). We then perform a series of simple machine refactorings to obtain a sound, computable
approximation, which takes the form of a
non-deterministic state-transition systems with finite state spaces. the
approach scales up uniformly to enable program analysis
of realistic language features, including higher-order functions, tail calls, conditionals, side effects, exceptions, first-class continuations, and even garbage collection.
1. intRoDUCtion
Software engineering, compiler optimizations, program
parallelization, system verification, and security assurance depend on program analysis, a ubiquitous and central theme of programming language research. At the
same time, the production of modern software systems
employs expressive, higher-order languages such as Java,
JavaScript, C#, Python, Ruby, etc., implying a growing
need for fast, precise, and scalable higher-order program
analyses.
Program analysis aims to soundly predict proper-
ties of programs before being run. (Sound in program
analysis means “conservative approximation”: if a
sound analysis says a program must not exhibit behav-
ior, then that program will not exhibit that behavior;
but if a sound analysis says a program may exhibit a
behavior, then it may or may not exhibit that behav-
ior.) For over 30 years, the research community has
expended significant effort designing effective analy-
ses for higher-order programs.
13 Past approaches have
focused on connecting high-level language seman-
tics such as structured operational semantics, deno-
tational semantics, or reduction semantics to equally
high-level but dissimilar analytic models. These models
are too often far removed from their programming lan-
guage counterparts and take the form of constraint
languages specified as relations on sets of program
fragments.
12, 18, 25 These approaches require significant
ingenuity in their design and involve complex construc-
tions and correctness arguments, making it difficult
to establish soundness, design algorithms, or grow
the language under analysis. Moreover, such analytic
models, which focus on “value flow,” i.e., determin-
ing which syntactic values may show up at which pro-
gram sites at run-time, have a limited capacity to reason
about many low-level intensional properties such as mem-
ory management, stack behavior, or trace-based proper-
ties of computation. Consequently, higher-order program
analysis has had limited impact on large-scale systems,
despite the apparent potential for program analysis to aid
in the construction of reliable and efficient software.