table 1. comparing a conventional sVm to a structual sVm that
directly optimizes the performance measure.
Dataset method F1
rEUTERS STRUCT svm 62.0
svm 56. 1
ARXIV STRUCTsvm 56. 8
svm 49. 6
oPTDIGITS STRUCT svm 92. 5
svm 91. 5
COVERTYPE STRUCT svm 73. 8
svm 73. 9
PRBeP
68. 2
65. 7
58. 4
57. 9
92. 7
91. 5
72. 1
71.0
RocaREA
99. 1
98. 6
92. 8
92. 7
99. 4
99. 4
94. 6
94. 1
of the desired performance measure instead of using proxy
measures during training (e.g., different linear cost models).
However, there are still several open question before making this process fully automatic; for example, understanding the interaction between D and Y as recently explored in
Chakrabarti et al. and Chapelle et al.
3, 4
4. concLusion anD outLooK
In summary, structural SVMs are flexible and efficient methods for structured output prediction with a wide range of
potential applications, most of which are completely or
largely unexplored. Other explored applications include
hierarchical classification,
2 clustering,
7 and optimizing average precision.
30 Due to the universality of the cutting-plane
training algorithm, only relatively small API changes are
required for any new application. SVMstruct is an implementation of the algorithm with APIs in C and Python, and it is
available at
svmlight.joachims.org.
Beyond applications, there are several areas for research in
further developing the method. One such area is training structural SVMs with Kernels. While the cutting-plane method can
be extended to use Kernels, it becomes quite slow and more
efficient methods are needed.
28 Another area results from that
fact that solving the two argmax problems exactly is intractable for many application problems. However, for many such
problems there exist methods that compute approximate
solutions. An interesting question is how the approximation
quality affects the quality of the structural SVM solution.
8
This work was supported in part through NSF Awards IIS-
0412894 and IIS-0713483, NIH Grants IS10RR020889 and
GM67823, a gift from Yahoo!, and by Google.
6. Crammer, K., Singer, y. on the
algorithmic implementation of
multiclass kernel-based vector
machines. J. Mach. Learn. Res.
(JMLR) 2 (2001), 265–292.
7. finley, t., Joachims, t. Supervised
clustering with support vector
machines. in International
Conference on Machine Learning
(ICML) (2005), 217–224.
8. finley, t., Joachims, t. training
structural SVMs when exact inference
is intractable. in International
Conference on Machine Learning
(ICML) (2008), 304–311.
9. hastie, t., tibshirani, r., friedman, J.
The Elements of Statistical Learning.
Springer (2001).
10. hofmann, t., Schölkopf, b., Smola a.J.
Kernel methods in machine
learning. Ann. Stat. 36, 3 (2008),
1171–1220.
11. Joachims,t. Learning to Classify
Text Using Support Vector Machines
– Methods, Theory, and Algorithms.
Kluwer/Springer (2002).
12. Joachims, t. a support vector
method for mulivariate performance
measures. in International
Conference on Machine Learning
(ICML) (2005), 377–384.
13. Joachims, t. training linear SVMs
in linear time. in ACM SIGKDD
International Conference on
Knowledge Discovery and Data Mining
(KDD) (2006), 217–226.
14. Joachims,t., finley, t., yu, C.-n.
Cutting-plane training of structural
svms. Machine Learning Journal
(2009) Doi 10.1007/S 10994-009-
5108–8
15. Khuller, S., Moss, a., naor, J. the
budgeted maximum coverage
problem. Inform. Process. Lett. 70, 1
(1997), 39–45.
16. Lafferty, J., McCallum, a.,
Pereira, f. Conditional random
fields: Probabilistic models for
segmenting and labeling sequence
data. in International Conference on
Machine Learning (ICML) (2001).
17. LeCun, y., bottou, L., bengio, y., haffner,
P. gradient-based learning applied to
document recognition. Proc. IEEE 86,
11 (1998), 2278–2324, november.
18. McCallum, a., freitag, D., Pereira, f.
Maximum entropy Markov models
for information extraction and
segmentation. in International
Conference on Machine Learning
(ICML) (2000), 591–598.
19. Meller, J., elber, r. Linear
programming optimization and a
double statistical filter for protein
threading protocols. Proteins Struct.
Funct. Genet. 45 (2001), 241–261.
20. Qiu, J., elber, r. SSaLn: an alignment
algorithm using structure-dependent
substitution matrices and gap
penalties learned from structurally
aligned protein pairs. Proteins 62
(2006), 881–91.
21. robertson, S., Walker, S., Jones, S.,
hancock-beaulieu, M., gatford, M.
okapi at treC- 3. in Proceedings of
TREC- 3 (1994).
22. Sha, f., Pereira, f. Shallow parsing
with conditional random fields. in
NAACL’03: Proceedings of the 2003
Conference of the North American
Chapter of the Association for
Computational Linguistics on Human
Language Technology (Morristown,
nJ, USa, 2003), 134–141. association
for Computational Linguistics.
23. Shindyalov, i.n., bourne, P.e. Protein
structure alignment by incremental
combinatorial extension(Ce) of the
optimal path. Protein Eng. 11 (1998),
739–747.
24. Swaminathan, a., Mathew, C.,
Kirovski, D. essential pages. technical
report MSr-tr-2008–015, Microsoft
research (2008).
25. taskar, b., guestrin, C., Koller, D.
Maximum-margin markov networks.
in Advances in Neural Information
Processing Systems (NIPS) (2003).
26. tsochantaridis, i., hofmann, t.,
Joachims, t., altun, y. Support vector
machine learning for interdependent
and structured output spaces. in
International Conference on Machine
Learning (ICML) (2004), 104–112.
27. tsochantaridis, i., Joachims,
t., hofmann, t., altun, y. Large
margin methods for structured and
interdependent output variables.
J. Mach. Learn. Res. (JMLR), 6
(September 2005), 1453–1484.
28. yu, C.-n. J., Joachims, t. training
structural svms with kernels using
sampled cuts. in ACM SIGKDD Conference on Knowledge Discovery and
Data Mining (KDD) (2008), 794–802.
29. yu, C.-n.J., Joachims, t., elber, r.,
Pillardy, J. Support vector training of
protein alignment models. J. Comput.
Biol. 15, 7 (September 2008), 867–880.
30. yue, y., finley, t., radlinski, f.,
Joachims, t. a support vctor method
for optimizing average precision. in
ACM SIGIR Conference on Research
and Development in Information
Retrieval (SIGIR) (2007), 271–278.
31. yue, y., Joachims, t. Predicting diverse
subsets using structural SVMs. in
International Conference on Machine
Learning (ICML) (2008), 271–278.
32. Zhai, C., Cohen, W. W., Lafferty, J.
beyond independent relevance:
Methods and evaluation metrics for
subtopic retrieval. in Proceedings of
the ACM Conference on Research and
Development in Information Retrieval
(SIGIR) (2003).
33. Zhang, y., Skolnick, J. tM-align: a
protein structure alignment algorithm
based on tM-score. Nucleic Acids Res.
33 (2005), 2302–2309.
References
1. baeza-yates, r., ribeiro-neto, b.
Modern Information Retrieval.
addison-Wesley-Longman, harlow,
UK (May 1999).
2. Cai, L., hofmann, t. hierarchical
document categorization with support
vector machines. in Proceedings of
the ACM Conference on Information
and Knowledge Management (CIKM)
(2004).
3. Chakrabarti, S., Khanna, r., Sawant, U.,
battacharyya, C. Structured learning
for non-smooth ranking losses. in ACM
Conference on Kno wledge Discovery
and Data Mining (KDD) (2008).
4. Chapelle, o., Le, Q., Smola, a. Large
margin optimization of ranking
measures. in NIPS Workshop on
Machine Learning for Web Search
(2007).
5. Collins, M. Discriminative training
methods for hidden markov models:
theory and experiments with
perceptron algorithms. in Empirical
Methods in Natural Language
Processing (EMNLP) (2002), 1–8.
Thorsten Joachims
Department of Computer Science, Cornell
University, ithaca, ny.
Thomas hofmann
google inc., Zürich,
Switzerland.
© 2009 aCM 0001-0782/09/1100 $10.00
Yisong Yue,
Department of Computer Science, Cornell
University, ithaca, ny.
Chun-Nam Yu
Department of Computer Science Cornell
University, ithaca, ny.