the Communications Web site, cacm.acm.org,
features 13 bloggers in the BLoG@cacm
community. in each issue of Communications,
we’ll publish excerpts from some of
their posts, plus readers’ comments.
DOI:10.1145/1506409.1506434
cacm.acm.org/blogs/blog-cacm
Recommendation
algorithms, online
Privacy, and more
Greg Linden, Jason Hong, Michael Stonebraker, and Mark Guzdial
discuss recommendation algorithms, online privacy, scientific
databases, and programming in introductory computer
science classes.
from Greg Linden’s
“What is a Good
Recommendation
algorithm?”
Netflix is offering one
million dollars for a better recommendation engine. Better
recommendations clearly are worth a
lot.
But what are better recommendations? What do we mean by “better”?
In the Netflix Prize, the meaning of
better is quite specific. It is the root
mean squared error (RMSE) between
the actual ratings Netflix customers
gave the movies and the predictions of
the algorithm.
Let’s say we build a recommender
that wins the contest. We reduce the
error between our predictions and
what people actually will rate by 10%
over what Netflix used to be able to do.
Is that good?
Depending on what we want, it
might be very good. If what we want
to do is show people how much they
might like a movie, it would be good
to be as accurate as possible on every
movie.
However, this might not be what
we want. Even in a feature that shows
people how much they might like any
particular movie, people care a lot
more about misses at the extremes.
For example, it could be much worse
to say that you will be lukewarm (a
prediction of 3 stars) on a movie you
½
love (an actual of 4 stars) than to say
½
you will be slightly less lukewarm (a
prediction of 2 stars) on a movie you
½
are lukewarm about (an actual of 3
½
stars).
Moreover, what we often want is
not to make a prediction for any movie, but find the best movies. In TopN
recommendations, a recommender is
trying to pick the best 10 or so items
for someone.
A recommender that does a good
job predicting across all movies might
not do the best job predicting the
TopN movies. RMSE equally penalizes
errors on movies you do not care about
seeing as it does errors on great movies, but perhaps what we really care
about is minimizing the error when
predicting great movies.
There are parallels here with Web
search. Web search engines primarily
care about precision (relevant results
in the top 10 or top three). They only
care about recall when someone would
notice something they need missing
from the results they are likely to see.
Search engines do not care about errors
scoring arbitrary documents, just their
ability to find the TopN documents.
Aggravating matters further, in
both recommender systems and Web
search, people’s perception of quality
is easily influenced by factors other
than the items shown. People hate
slow Web sites and perceive slowly
appearing results to be worse than
fast-appearing results. Differences in
the information provided about each
item, especially missing data or misspellings, can influence perceived
quality. Presentation issues, even the
color of links, can change how people
focus their attention and which recommendations they see. People trust
recommendations more when the engine can explain why it made them.
People like recommendations that update immediately when new information is available. Diversity is valued;
near duplicates disliked. New items
attract attention, but people tend to
judge unfamiliar or unrecognized recommendations harshly.
In the end, what we want is happy,
satisfied users. Will a recommenda-