The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
Follow us on Twitter at http://twitter.com/blogCACM
John Langford poses questions about the direction of research for
machine learning and algorithms. Ruben Ortega shares lessons
about agile development practices like Scrum.
John Langford “Research Directions for Machine Learning and Algorithms” http://cacm.acm.org/ blogs/blog-cacm/108385 May 16, 2011
S. Muthu Muthukrishnan invited me
to the National Science Foundation’s
Workshop on Algorithms in the Field
with the goal of providing a sense of
where near-term research should go.
When the time came, though, I instead
bargained for a post, which provides a
chance for other people to comment.
There are several things I did not
fully understand when I went to Yahoo!
about five years ago. I would like to repeat them as people in academia may
not yet understand them intuitively.
1. Almost all the big-impact algo-
rithms operate in pseudo-linear or
better time. Think about caching,
hashing, sorting, filtering, etc. and you
have a sense of what some of the most
heavily used algorithms are. This mat-
ters quite a bit to machine learning
research, because people often work
with superlinear time algorithms
and languages. Two very common ex-
amples of this are graphical models
where inference is often a superlin-
ear operation—think about the n2 de-
pendence on the number of states in
a Hidden Markov Model and Kernel-
ized Support Vector Machines where
optimization is typically quadratic or
worse. There are two basic reasons
for this. The most obvious is that lin-
ear time allows you to deal with large
datasets. A less obvious but critical
point is that a superlinear time algo-
rithm is inherently buggy; it has an
unreliable running time that can eas-
ily explode if you accidentally give it
too much input.