the Communications Web site, cacm.acm.org, features 13 bloggers in the BLoG@cacm community. in each issue of Communications, we’ll publish excerpts from some of their posts, plus readers’ comments.
Greg Linden, Jason Hong, Michael Stonebraker, and Mark Guzdial discuss recommendation algorithms, online privacy, scientific databases, and programming in introductory computer science classes.
from Greg Linden’s “What is a Good Recommendation algorithm?”
Netflix is offering one million dollars for a better recommendation engine. Better recommendations clearly are worth a lot.
But what are better recommendations? What do we mean by “better”?
In the Netflix Prize, the meaning of better is quite specific. It is the root mean squared error (RMSE) between the actual ratings Netflix customers gave the movies and the predictions of the algorithm.
Let’s say we build a recommender that wins the contest. We reduce the error between our predictions and what people actually will rate by 10% over what Netflix used to be able to do. Is that good?
Depending on what we want, it might be very good. If what we want to do is show people how much they
might like a movie, it would be good to be as accurate as possible on every movie.
However, this might not be what we want. Even in a feature that shows people how much they might like any particular movie, people care a lot more about misses at the extremes. For example, it could be much worse to say that you will be lukewarm (a prediction of 3 stars) on a movie you
½
love (an actual of 4 stars) than to say
½
you will be slightly less lukewarm (a
prediction of 2 stars) on a movie you
½
are lukewarm about (an actual of 3
½
stars).
Moreover, what we often want is not to make a prediction for any movie, but find the best movies. In TopN recommendations, a recommender is trying to pick the best 10 or so items for someone.
A recommender that does a good job predicting across all movies might not do the best job predicting the TopN movies. RMSE equally penalizes
errors on movies you do not care about seeing as it does errors on great movies, but perhaps what we really care about is minimizing the error when predicting great movies.
There are parallels here with Web search. Web search engines primarily care about precision (relevant results in the top 10 or top three). They only care about recall when someone would notice something they need missing from the results they are likely to see. Search engines do not care about errors scoring arbitrary documents, just their ability to find the TopN documents.
Aggravating matters further, in both recommender systems and Web search, people’s perception of quality is easily influenced by factors other than the items shown. People hate slow Web sites and perceive slowly appearing results to be worse than fast-appearing results. Differences in the information provided about each item, especially missing data or misspellings, can influence perceived quality. Presentation issues, even the color of links, can change how people focus their attention and which recommendations they see. People trust recommendations more when the engine can explain why it made them. People like recommendations that update immediately when new information is available. Diversity is valued; near duplicates disliked. New items attract attention, but people tend to judge unfamiliar or unrecognized recommendations harshly.
In the end, what we want is happy, satisfied users. Will a recommenda-
References:
Archives