Predicting Structured Objects
Doi: 10.1145/1592761.1592783
with Support Vector Machines
By Thorsten Joachims, Thomas Hofmann, Yisong Yue, and Chun-nam Yu
abstract
Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need
to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language
processing, search engines, and bioinformatics. The following explores a generalization of Support Vector Machines
(SVMs) for such complex prediction problems.
1. intRoDuction
Consider the problem of natural language parsing illustrated in Figure 1 (left). A parser takes as input a natural
language sentence, and the desired output is the parse tree
decomposing the sentence into its constituents. While
modern machine learning methods like Boosting, Bagging,
and Support Vector Machines (SVMs) (see e.g., Hastie et
al.
9) have become the methods of choice for other problems
in natural language processing (NLP) (e.g. word-sense dis-ambiguation), parsing does not fit into the conventional
framework of classification and regression. In parsing, the
prediction is not a single yes/no, but a labeled tree. And
breaking the tree down into a collection of yes/no predictions is far from straightforward, since it would require
modeling a host of interdependencies between individual
predictions. So, how can we take, say, an SVM and learn a
rule for predicting trees?
Obviously, this question arises not only for learning to predict trees, but similarly for a variety of other structured and
complex outputs. Structured output prediction is the name for
such learning tasks, where one aims at learning a function
h: X → Y mapping inputs x Î X to complex and structured
outputs y Î Y. In NLP, structured output prediction tasks
range from predicting an equivalence relation in noun-phrase
co-reference resolution (see Figure 1, right), to predicting an
accurate and well-formed translation in machine translation.
But problems of this type are also plentiful beyond NLP. For
example, consider the problem of image segmentation (i.e.,
predicting an equivalence relation y over a matrix of pixels x),
the problem of protein structure prediction (which we will
phrase as an alignment problem in Section 3. 2), or the problem of web search (i.e., predicting a diversified document
ranking y given a query x as explored in Section 3. 1). We will
see in Section 3. 3 that even binary classification becomes a
structured prediction task when aiming to optimize multivariate performance measures like the F1-Score or Precision@k.
In this paper we describe a generalization of SVMs, called
Structural SVMs,
14, 26, 27 that can be used to address a large
range of structured output prediction tasks. On the one
hand, structural SVMs inherit the attractive properties of
regular SVMs, namely a convex training problem, flexibility
in the choice of loss function, and the opportunity to learn
nonlinear rules via kernels. On the other hand, structural
SVMs inherit the expressive power of generative models (e.g.,
Probabilistic Context-Free Grammars [PCFG] or Markov
Random Fields [MRF]). Most importantly, however, structural SVMs are a discriminative learning method that does
not require the independence assumptions made by conventional generative methods. Similar to the increase in prediction accuracy one typically sees when switching from a naive
Bayes classifier to a classification SVM (e.g., Joachims11),
structural SVMs promise such benefits also for structured
prediction tasks (e.g., training PCFGs for parsing).
Structural SVMs follow and build upon a line of research
on discriminative training for structured output prediction, including generalizations of Neural Nets,
17 Logistic
Regression16, and other conditional likelihood methods (e.g.,
McCallum et al.
18). A particularly eye-opening paper is Lafferty
et al.
16 showing that global training for structured prediction
can be formulated as a convex optimization problem. Our
work follows this track. More closely, however, we build upon
structural Perceptrons5 as well as methods for protein threading19 and extend these to a large-margin formulation with an
efficient training algorithm. Independent of our work, Taskar
et al.
25 arrived at a similar formulation, but with more restrictive conditions on the form of the learning task.
In the following, we explain how structural SVMs work
and highlight three applications in search engine ranking,
protein structure prediction, and binary classification under
nonstandard performance measures.
2. stRuctuRaL sVms
How can one approach structured output prediction? On
an abstract level, a structured prediction task is much like a
multiclass learning task. Each possible structure y Î Y (e.g.,
parse tree) corresponds to one class (see Figure 2), and classifying a new example x amounts to predicting its correct
“class.” While the following derivation of structural SVMs
starts from multiclass SVMs,
6 there are four key problems
that need to be overcome. All of these problems arise from
the huge number |Y | of classes. In parsing, for example, the
number of possible parse trees is exponential in the length
of the sentence. And the situation is similar for most other
structured output prediction problems.
A previous version of this paper—“Large Margin Methods
for Structured and Interdependent Output Variables” by
I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun
—was published in J. Mach. Learn. Res. (Sept. 2005).