102 COMMUNICATIONS OF THE ACM | MARCH 2019 | VOL. 62 | NO. 3
deal with the combination of unrestricted network structure
and a large number of possible predictions per element).
2. 4. Output program
Finally, after the new names are inferred, our system transforms the original program to use these names. The output of the entire inference process is captured in the
program shown in Figure 4(e). Notice how in this output
program, the names tend to accurately capture what the
2. 5. Predicting type annotations
Even though we illustrated the inference process for variable names, the overall flow for predicting type annotations is identical. Interestingly, after predicting types,
one can invoke a standard type checker to check whether
the predicted types are valid for the program. For our
example in Figure 4(e), the predicted type annotations
(shown in comments) are indeed valid. In general, when
predicting semantic properties (such as types) where
soundness plays a role, our approach can be used as part
of a guess-and-check loop.
2. 6. Independent of minification
We note that our name inference process is independent of
what the minified names are. In particular, the process will
return the same names regardless of which minifier was
used to obfuscate the original program (provided these minifiers always rename the same set of variables).
3. STRUCTURED PREDICTION
We now introduce the structured prediction approach. We
later instantiate this approach in Section 4.
3. 1. Notation: programs, labels, predictions
Let x ∈ X be a program. As with standard program analysis,
we will infer properties about program statements or
expressions (referred to as program elements). For a program x, each element (e.g., a variable) is identified with
an index (a natural number). We usually separate the elements into two kinds: (i) elements for which we are interested in inferring properties and (ii) elements whose
properties we know (e.g., obtained via say standard program analysis or manual annotation). We use two helper
functions n, m: X → N to return the appropriate number of
program elements for a given program x: n(x) returns the
number of elements of kind (i) and m(x) the number of elements of kind (ii). When x is clear from the context, we
write n instead of n(x) and m instead of m(x).
We use the set LabelsU to denote all possible values that a
predicted property can take. For instance, in type prediction, LabelsU contains all possible basic types (e.g., number,
string, etc). Then, for a program x, we use the notation
y = (y1, …, yn(x)) to denote a vector of predicted program properties. Here, y ∈ Y where Y = (LabelsU)*. That is, each entry yi in
vector y ranges over LabelsU and denotes that program element i has property yi.
For a program x, we define the vector to
capture known properties. Here, each can take a value
from the set of properties LabelsK which could potentially
differ from the set of properties LabelsU that we use for
prediction. For example, if the known properties are inte-
ger constants, LabelsK will contain all valid integers. To
avoid clutter when x is clear from the context, we use z
instead of zx. We use Labels = LabelsU ∪ LabelsK to denote
the set of all properties.
Note that to apply this method the total number of predictions must be fixed (bounded) in advance (i.e., n(x) ). This
is unlike other settings, for example, grammars,
10 where the
number of predictions can be unbounded.
3. 2. Problem definition
Let denote the training data: a set of t pro-
grams each annotated with corresponding properties. Our
goal is to learn a model that captures the conditional prob-
ability Pr(y | x). Once the model is learned, we can predict
properties of new programs by posing the following MAP
That is, we aim to find the most likely assignment of
program properties y according to the probabilistic
model. Here, Ωx ⊆ Y describes the set of possible assign-
ments of properties y′ to program elements of x. The set
Ωx allows restricting the set of possible properties and is
useful for encoding problem-specific constraints. For
example, in type annotation inference, the set Ωx may
restrict the annotations to types that make the resulting
3. 3. Conditional random fields (CRFs)
We now briefly describe CRFs, a particular model defined in
Lafferty et al.
15 and previously used for a range of tasks such
as natural language processing, image classification, and
others. CRFs represent the conditional probability Pr(y | x).
We consider the case where the factors are positive in which
case, without loss of generality, any conditional probability
of properties y given a program x can be encoded as follows:
where, score is a function that returns a real number indicating the score of an assignment of properties y for a program
x. Assignments with higher score are more likely than
assignments with lower score. Z(x), called the partition function, ensures that the above expression does in fact encode a
conditional distribution. It returns a real number depending only on the program x, that is:
W.l.o.g, score can be expressed as a composition of a sum of
k feature functions fi associated with weights wi:
Here, f is a vector of functions fi and w is a vector of weights