the effect on the heap and the taints when accessing them.
Vex function and object summaries are hence simplified
JavaScript objects and functions containing only the essential
functionality of the objects. For example, a JavaScript Array
object is defined in Figure 4 to be a function object with the
@Class,prototype, and @Proto properties initialized to
the string “Function”, identifier ArrayProt, and identifier
FunctionProt, respectively. The variables FunctionProt
and ArrayProt point to the prototype objects, which contain the various functions like length and push.
Browser’s dOm api and XpCOm components: Vex treats
most of the browser’s DOM API, and XPCOM components as
uninitialized variables, fields, and functions. However, Vex
provides explicit function summaries for the API components and objects that Vex needs to keep track of in order to
trace the flows to and from the objects. Vex analysis sets the
taint of the objects that represent insecure sources or those
that are dependent on insecure sources to High.
higher-order functions: Vex analysis accurately keeps track
of the objects and implements function calls by inlining
the function bodies according to the JavaScript semantics.
Higher-order functions calls are also inlined. Additionally,
Vex provides summaries for some higher-order functions
in the JavaScript API. For example, the settimeout function in JavaScript takes a callback function as its first argument. This function is represented in Vex as a function in
which the function body invokes the callback function in
the first argument.
dynamically generated code: The eval method in
JavaScript allows execution of dynamically formed code,
and is widely used in browser extensions. While an accurate analysis of the structure of dynamically created code is
a research topic in itself, and quite out of the scope of this
paper, the analysis cannot simply ignore eval statements.
Vex analysis performs a constant-string analysis for strings
and string operations. If the actual parameters to the eval
statement evaluate to a constant string, Vex’s static analysis engine parses these constant strings and inserts them
into the program flow just after the eval statement. This
ensures that these newly parsed statements are included in
the computation of the taint. In most correct extensions, an
eval-ed statement is dynamically chosen from a set of constant-strings or taken from trusted sources, and hence evaluate to a constant string on the path explored (and tracked
accurately by Vex). Parameters to eval, whose exact string
values are not statically inferred by Vex along the path
explored, are tested to check if they are tainted. If there is
a flow from an untrusted source to an eval, Vex will report
this flow, as it corresponds to a vulnerable flow pattern.
Object properties accessed in the form of associative arrays:
figure 4. Array object summary in Vex.
var Array = function(){
this.@Class = "Function";
this.prototype = ArrayProt;
this.@Proto = Function Prot;};
In JavaScript, objects are treated as associative arrays. This
means that any property of the object can be accessed using
the array notation. Array indices could be constant strings,
which are then evaluated to get the actual property being
accessed; or they could be numbers, which indicate the
property number that is being accessed; or they could be
variables, that could be instantiated at run time. If Vex cannot evaluate the array index to a property name for any reason, the array access conservatively gets the taints of every
property in the parent array object.
functions that take arbitrary number of arguments: Some
functions in JavaScript can have variable numbers of arguments. For example, the push method of an array can be
called with any number of arguments and the arguments
will be appended to the end of that array. To handle this in
Vex, the object representing the push method has a special
property indicating that it can take a variable number of arguments and when the method is called, Vex analysis conservatively appends the taints of all the arguments to the push
method to the array object on which the method is called.
4. 3. a note on soundness
Most static analysis tools, such as those used in compilers
and those used in abstract interpretation, over-approximate
the concrete semantics, and hence are sound. In the context
of flow analysis, a sound tool never reports that a program
has no flows when it has one. Soundness often entails a large
number of false positives, i.e., flows that are reported by the
tool but may not actually ever happen during execution.
Vex is not sound. We believe that a sound state-of-the-art analysis tool for JavaScript extensions would overwhelm
and frustrate the tool’s users with a torrent of false positives. Thus to handle certain features of JavaScript without producing excessively many false positives, we chose
not to make Vex sound. As a consequence, for example,
a maliciously written extension could quite easily evade
detection by Vex. On the other hand, a maliciously written
extension can easily harm its users directly, without any
input from untrusted Web pages. This underlies the reason
why our threat model assumes that the extension author is
not malicious.
Instead of aiming for soundness, we concentrated on
making Vex fairly accurate on paths in the program, without collapsing (merging) the nodes of the heap in any way.
However, since Vex can only analyze a finite number of paths
in the program (obtained by unrolling recursion a bounded
number of times) in this accurate manner, the analysis Vex
performs is inherently not sound.
False positives are also, of course, still possible in Vex, i.e.,
Vex may report flows that actually do not exist in the program.
This stems from the fact that the analysis uses an abstraction. In particular, not having precise enough information
for evaluating conditionals, not precisely being able to determine the values of strings being subject to eval statements,
etc. are common sources for false positives. Compared to
classical heap analysis in programs that merges nodes in
heaps, Vex performs a much more accurate analysis that
reduces the number of false positives considerably. In experiments, we found that Vex produces very few false positives.