us to better answer these questions.
First, we permitted ourselves to train
our classifier only on the 100-variable
instances. Second, we considered only
decision trees7 having at most two decision nodes. (We obtained these models by a standard decision tree learning
procedure: greedily choosing a feature
that best partitioned the training data
into satisfiable and unsatisfiable instances, and recursively splitting the
resulting partitions.) Finally, we omitted all probing features. Although
probing (for example, local-search) features were very useful for prediction,
they were disproportionately effective
on small instances and hence would
have complicated our study of scaling
behavior. Also, because we aimed to obtain easily comprehensible models, we
were disinclined to use features based
on complex, heuristic algorithms.
Given all of these restrictions, we
were astonished to observe predictive
accuracies consistently above 65%, and
apparently independent of problem
size (see Figure 3, left; statistical testing
showed no evidence that accuracy falls
with problem size). Indeed, our first
two restrictions appear to have come
at low cost, decreasing accuracies by
only about 5%. (Furthermore, after lift-
ing these restrictions, we still found no
evidence that accuracy was affected by
problem size.) Hence, the reader may
be interested in understanding our
two-feature model in more detail; we
hope it will serve as a starting point for
new theoretical analysis of finite-size
SAT instances at the phase transition.
The model is given in Figure 3 (Right).
LPSLACK _ coeff _ variation is
based on solving a linear program-
ming relaxation of an integer program
representation of SAT instances. For
each variable i with LP solution value Si
∈ [0, 1], LPSLACKi is defined as min{ 1
− Si, Si}: the deviation of Si from inte-
grality. LPSLACK _ coeff _ varia-
tion is then the coefficient of variation
(the standard deviation divided by the
mean) of the vector LPSLACK. The ith
element of the vector POSNEG _ ra-
tio _ var _ mean is the average ratio
of positive and negative occurrences of
each variable. For each variable i with
Pi positive occurrences and Ni negative
occurrences, POSNEG _ ratio _ vari
is defined as |0.5 − Pi / (Pi + Ni)|. POS-
NEG _ ratio _ var _ mean is then
the average over elements of the vector
POSNEG _ ratio _ var. Finally, recall that our model was trained on con-stant-size instances; we normalized
the LPSLACK _ coeff _ variation
and POSNEG _ ratio _ var _ mean
features to have mean 0 and standard
deviation 1 on this training set. To evaluate the model on a given instance of
a different size, we randomly sampled
many new instances of that size to compute new normalization factors, which
we then applied to the given instance.
Beyond Uniform-Random 3-SAT
Our original motivation involved
studying real problems faced by a
practitioner, problems that are very
unlikely to have uniform random
structure. Thus, it is important to
demonstrate that EHMs work reliably
for a wide range of more realistic in-
stance distributions, and that they are
not limited to SAT. In short, they do,
and they are not. By now, we have built
EHMs for four different NP-complete
problems: SAT14, 16, 21, 33, 38, 40 the combina-
torial auction winner determination
problem (WDP),
28, 29 mixed integer pro-
gramming (MIP, a standard encoding
for problems with both discrete and
continuous variables),
14, 19, 21 and the
traveling salesman problem (TSP).
21
Observe that we have considered both
optimization and decision problems,
and that these problems involve discrete variables, continuous variables,
and combinations of the two. For each
problem, we derived a new set of instance features. This was not trivial,
but not terribly difficult either; in all
cases we used problem size measures,
syntactic properties, and probing features. Extending what we know now to
a new domain would probably entail a
few days of work. In our publications
and other technical work (for example,
submissions to SAT competitions) we
have considered more than 30 instance
distributions. These include sophisticated random generators (for example,
SAT reductions from graph coloring
and factoring; combinatorial auction
benchmarks based on economic models); instance sets from public benchmarks and competitions (for example,
MIPLIB; the SAT competition); and
sets of instances derived from practical applications (for example, SAT-en-coded instances from software verification and bounded model checking;
industrial MIP instances ranging from
There currently exists no “best” SAT solver; different solvers perform well on different
families of instances, and performance differences between them are typically very
large. The effectiveness of EHMs suggests a straightforward solution to the algorithm
selection problem:
35 given a new problem instance, predict the runtime of several SAT
solvers, and then run the one predicted to be fastest. This approach27 forms the core of
SATzilla32, 33, 40 a portfolio-based algorithm selector for SAT.
SATzilla first participated in the 2003 SAT Competition (http://www.
satcompetition.org), and placed second and third in several categories. We have since
extensively improved the approach, allowing randomized and local search algorithms
as component solvers; introducing the idea of presolvers that are run for a short, fixed
time before the selected solver; adding the ability to optimize for complex scoring
functions; and automating the construction of the selector (for example, pre-solver
selection; component solver selection) given data. Leveraging these improvements,
and benefiting from the continued improvement of the component solvers upon which
it draws, SATzilla led the field in the 2007 and 2009 SAT Competitions, winning five
medals each time.
More recently, our design of SATzilla evolved from selection based on runtime
predictions (EHMs) to a cost-sensitive classification approach that directly selects the
best-performing solver without predicting runtime.
41 In the 2012 SAT Challenge (http://
baldur.iti.kit.edu/SAT-Challenge-2012), SATzilla was eligible to enter four categories;
it placed first in three of these and second in the fourth. Overall, SATzilla’s success
demonstrates the effectiveness of automated, statistical methods for combining
existing solvers—including “uncompetitive” solvers with poor average performance.
Except for the instance features used by our models, our approach is entirely general,
and is likely to work well for other problems with high runtime variation. All of our
software is publicly available; see http://www.cs.ubc.ca/labs/beta/Projects/SATzilla.
Application 1. Algorithm
Selection (SATzilla)