Doi: 10.1145/1592761.1592783
By Thorsten Joachims, Thomas Hofmann, Yisong Yue, and Chun-nam Yu
abstract Machine Learning today offers a broad repertoire of methods for classification and regression. But what if we need to predict complex objects like trees, orderings, or alignments? Such problems arise naturally in natural language processing, search engines, and bioinformatics. The following explores a generalization of Support Vector Machines (SVMs) for such complex prediction problems.
Consider the problem of natural language parsing illustrated in Figure 1 (left). A parser takes as input a natural language sentence, and the desired output is the parse tree decomposing the sentence into its constituents. While modern machine learning methods like Boosting, Bagging, and Support Vector Machines (SVMs) (see e.g., Hastie et al. 9) have become the methods of choice for other problems in natural language processing (NLP) (e.g. word-sense dis-ambiguation), parsing does not fit into the conventional framework of classification and regression. In parsing, the prediction is not a single yes/no, but a labeled tree. And breaking the tree down into a collection of yes/no predictions is far from straightforward, since it would require modeling a host of interdependencies between individual predictions. So, how can we take, say, an SVM and learn a rule for predicting trees?
Obviously, this question arises not only for learning to predict trees, but similarly for a variety of other structured and complex outputs. Structured output prediction is the name for such learning tasks, where one aims at learning a function h: X → Y mapping inputs x Î X to complex and structured outputs y Î Y. In NLP, structured output prediction tasks range from predicting an equivalence relation in noun-phrase co-reference resolution (see Figure 1, right), to predicting an accurate and well-formed translation in machine translation. But problems of this type are also plentiful beyond NLP. For example, consider the problem of image segmentation (i.e., predicting an equivalence relation y over a matrix of pixels x), the problem of protein structure prediction (which we will phrase as an alignment problem in Section 3. 2), or the problem of web search (i.e., predicting a diversified document ranking y given a query x as explored in Section 3. 1). We will see in Section 3. 3 that even binary classification becomes a structured prediction task when aiming to optimize multivariate performance measures like the F1-Score or Precision@k.
In this paper we describe a generalization of SVMs, called Structural SVMs, 14, 26, 27 that can be used to address a large range of structured output prediction tasks. On the one hand, structural SVMs inherit the attractive properties of
regular SVMs, namely a convex training problem, flexibility in the choice of loss function, and the opportunity to learn nonlinear rules via kernels. On the other hand, structural SVMs inherit the expressive power of generative models (e.g., Probabilistic Context-Free Grammars [PCFG] or Markov Random Fields [MRF]). Most importantly, however, structural SVMs are a discriminative learning method that does not require the independence assumptions made by conventional generative methods. Similar to the increase in prediction accuracy one typically sees when switching from a naive Bayes classifier to a classification SVM (e.g., Joachims11), structural SVMs promise such benefits also for structured prediction tasks (e.g., training PCFGs for parsing).
Structural SVMs follow and build upon a line of research on discriminative training for structured output prediction, including generalizations of Neural Nets, 17 Logistic Regression16, and other conditional likelihood methods (e.g., McCallum et al. 18). A particularly eye-opening paper is Lafferty et al. 16 showing that global training for structured prediction can be formulated as a convex optimization problem. Our work follows this track. More closely, however, we build upon structural Perceptrons5 as well as methods for protein threading19 and extend these to a large-margin formulation with an efficient training algorithm. Independent of our work, Taskar et al. 25 arrived at a similar formulation, but with more restrictive conditions on the form of the learning task.
In the following, we explain how structural SVMs work and highlight three applications in search engine ranking, protein structure prediction, and binary classification under nonstandard performance measures.
How can one approach structured output prediction? On an abstract level, a structured prediction task is much like a multiclass learning task. Each possible structure y Î Y (e.g., parse tree) corresponds to one class (see Figure 2), and classifying a new example x amounts to predicting its correct “class.” While the following derivation of structural SVMs starts from multiclass SVMs, 6 there are four key problems that need to be overcome. All of these problems arise from the huge number |Y | of classes. In parsing, for example, the number of possible parse trees is exponential in the length of the sentence. And the situation is similar for most other structured output prediction problems.
A previous version of this paper—“Large Margin Methods for Structured and Interdependent Output Variables” by I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun —was published in J. Mach. Learn. Res. (Sept. 2005).
References:
Archives