the Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the bLoG@CACm
community. in each issue of Communications, we’ll publish
excerpts from selected posts.
the netflix Prize,
outreach, and Japanese
Greg Linden writes about machine learning and the Netflix
Prize, Judy Robertson offers suggestions about getting teenagers
interested in computer science, and Michael Conover discusses
mobile phone usage and quick response codes in Japan.
from Greg Linden’s “the biggest Gains Come from Knowing Your Data” Machine learning is hard. It can be awfully tempt- ing to try to skip the work. Can’t we just download a machine learning package?
Do we really need to understand what
we are doing?
It is true that off-the-shelf algorithms
are a fast way to get going and experiment. Just plug in your data and go.
The only issue is if development
stops there. By understanding the peculiarities of your data and what people
want and need on your site, by experimenting and learning, it is likely you
can outperform a generic system.
A great example of how understanding the peculiarities of your data can
help came out of the Netflix Prize.
Progress on the $1 million prize largely
stalled until Gavin Potter discovered
peculiarities in the data, including that
people interpret the rating scale differently.
More recently, Yehuda Koren found
additional gains by supplementing the
models to allow for temporal effects,
such as that people tend to rate older
movies higher, that movies rated together in a short time window tend to
be more related, and that people over
time might start rating all the movies
they see higher or lower.
In both cases, looking closely at the
data, better understanding how people
behave, and then adapting the models
yielded substantial gains. Combined
with other work, that was enough to
win the million-dollar prize.
The Netflix Prize followed a pattern
you often see when people try to implement a feature that requires machine
learning. Most of the early attempts
threw off-the-shelf algorithms at the
data, yielding something that works,
but not with particularly impressive results.
Without a clear metric for success
and a way to test against that metric,
development stops there. But, like
Google and Amazon do with ubiquitous A/B testing, the Netflix Prize had
a clear metric for success and a way to
test against that metric.
There are a lot of lessons that can be
taken from the Netflix contest, but a big
one should be the importance of constant experimentation and learning.
By competing algorithms against each
other, by looking carefully at the data,
by thinking about what people want
and why they do what they do, and by
continuous testing and experimentation, you can reap big gains.
from Judy robertson’s
outreach: meeting the
I just spent the afternoon
working with teenagers at
some of our summer school workshops.
As luck would have it, we had two different sessions running on the same afternoon, and while galloping between
labs, it occurred to me some interesting
things were going on. First, a bit about
the workshops; the summer schools
were both for 17- and 18-year-olds, both
were set up to encourage young people
to study computer science, and both
involved building virtual worlds. One
of the workshops, on making computer
games using the Neverwinter Nights 2