learning algorithm to identify the most
useful ones. Our only requirement is
that the features be computable in
low-order polynomial time; in some
applications, we also restrict ourselves
to features that are quadratic time or
faster. For the SAT domain, we defined
138 features summarized as:
˲ Problem size measures c and v, plus
nonlinear combinations we expected to
be important, like c/v and c/v − 4. 26;
˲Syntactic properties of the instance (proximity to Horn clauses, balance of positive and negative literals,
and so on);
˲Constraint graph statistics. We
considered three graphs: nodes for
variables and edges representing
shared constraints (clauses); nodes for
clauses and edges representing shared
variables with opposite polarity; nodes
for both clauses and variables, and
edges representing the occurrence of
a variable in a given clause. For each
graph, we computed various statistics
based on node degrees, path lengths
and clustering, among others;
˲A measure of the integrality of
the optimal solution to the linear programming relaxation of the given SAT
instance—specifically, the distance
between this solution and the nearest
(feasible or infeasible) integral point;
˲Knuth’s estimate of search tree
size;
25 and,
˲Probing features computed by
running bounded-length trajectories
of local search and tree search algo-
rithms and extracting statistics from
tial variation at each point along the x
axis—over two orders of magnitude at
the “hard” phase transition point. The
runtime pattern also depends on satisfi-
ability status: hard instances are scarcer
and runtime variation is greater among
satisfiable instances than among unsat-
isfiable instances. One reason for this is
that on satisfiable instances the solver
can stop as soon as it encounters a satis-
fying assignment, whereas for unsatisfi-
able instances a solver must prove that
no satisfying assignment exists any-
where in the search tree.
A Case Study on
Uniform-Random 3-SAT
We now ask whether we can better understand the relationship between instance structure and solver runtime by
considering instance features beyond
just c/v. We will then use a machine
learning technique to infer a relationship between these features and runtime. Formally, we start with a set I of
instances, a vector xi of feature values
for each i ∈ I, and a runtime observation yi for each i ∈ I, obtained by running a given algorithm on i. Our goal
will be to identify a mapping f : x → y
that predicts yi as accurately as possible, given xi. We call such a mapping
an EHM.c Observe that we have just
c It is sometimes useful to build EHMs that predict a probability distribution over runtimes
rather than a single runtime; see Hutter et al.
21
For simplicity, here we discuss only the prediction of mean runtime.
described a supervised learning problem, and more specifically a regression
problem. There are many different regression algorithms that one could use
to solve this problem, and indeed, over
the years we have considered about a
dozen alternatives. Later in this article
we will advocate for a relatively sophisticated learning paradigm (random forests of regression trees), but we begin
by discussing a very simple approach:
quadratic ridge regression.
5 This method performs linear regression based on
the given features and their pairwise
products, and penalizes increases in
feature coefficients (the “ridge”). We
elaborate this method in two ways.
First, we transform the response variable by taking its (base- 10) logarithm;
this better allows runtimes, which vary
by orders of magnitude, to be described
by a linear model. Second, we reduce
the set of features by performing forward selection: we start with an empty
set and iteratively add the feature that
(myopically) most improves prediction. The result is simpler, more robust
models that are less prone to numerical
problems. Overall, we have found that
even simple learning algorithms like
this one usually suffice to build strong
EHMs; more important is identifying a
good set of instance features.
Instance features. It can be difficult
to identify features that correlate as
strongly with instance hardness as c/v.
We therefore advocate including all features that show some promise of being
predictive, and relying on the machine-
Figure 1. Runtime of march_hi on uniform-random 3-SAT instances with v = 400 and variable c/v ratio. Left: mean runtime, along with
p(c, v); right: per-instance runtimes, colored by satisfiability status. Runtimes were measured with an accuracy of 0.01s, leading to the
discretization effects visible near the bottom of the figure. Every point represents one SAT instance.
3. 26 3. 76 4. 26 4. 76 5. 26
10–2
100
102
Clauses to variables ratio
m
a
rch_
hi
mea
n
runti
me
[CP
U
sec]
− 1
−0.5
0
0.5
1
p(c,
v)
3. 26 3. 76 4. 26 4. 76 5. 26
Clauses to variables ratio
ma
rc
h_hi
me
an
r
unti
me
[CP
Us
ec]
10–2
100
102