With sentiment analysis algorithms, companies can identify and assess the wide variety of opinions found online and create computational models of human opinion.
FactS may be
John Adams once put it, but at least they’re easy to compute. From a data-processing perspective, opinions are much
stubborn things, as
more stubborn.
In recent years, the Web has created a bull market in human opinion: movie reviews, product ratings, restaurant recommendations, and all kinds of other viewpoints expressed in articles, blogs, discussion groups, and elsewhere. As the Web accumulates more and more data, many of us rely on each other’s opinions as a filter to help us make informed decisions. For many businesses, customer opinions have become a type of virtual currency that can make or break their products. As opinion data plays an increasingly important role on the Web, however, computer scientists are discovering the limitations of traditional text analytics algorithms for sorting opinions from raw facts.
The distinction between facts and opinions might seem clear enough on the surface, but in practice teasing them apart involves parsing many linguistic shades of gray. This is where the emerging field known as sentiment analysis comes in. Sometimes called opinion mining or subjectivity analysis, sentiment analysis is a new term that broadly refers to the identification and assessment of opinions, which for the purposes of computation might be defined as written expressions of subjective mental states.
Traditional text analytics algorithms work by scanning a body of text to extract and analyze keywords. That approach works well for identifying simple factual statements, but assessing opinions requires delving much deeper into the subtleties of human language. “Sentiments are very different from conventional facts,” says analytics consultant Seth Grimes. While direct expressions of opinion are fairly easy to spot—for example, “I hated Revenge of
An example of sentiment analysis from michael Gamon in which topics from reviews of the Volkswagon Golf are depicted. the size of each topic box indicates the number of mentions of the topic, and the shading of each topic box indicates the average sentiment, ranging from negative (red) to neutral/none (white) to positive (green).
the Sith”—most human sentiments fall somewhere along a continuum from objective fact to subjective experience. For example, “It’s fifteen degrees outside” is an objective statement; “It’s cold” reveals a somewhat more subjective point of view; while “I’m putting on two pairs of socks” constitutes a completely indirect expression of opinion disguised as a statement of fact.
“We are dealing with sentiment that can be expressed in subtle ways,” says Yahoo! researcher Bo Pang, co-author of the book Opinion Mining and Sentiment Analysis. To penetrate those sub-
tleties, sentiment analysis algorithms assess written statements through a series of overlapping filters. They usually begin by attempting to determine the polarity of a particular sentiment—i.e., Is it positive or negative? Once that’s established, they may try to determine the intensity of sentiment being expressed—i.e., How positive or negative is this statement? Next, an even more subtle layer of analysis might attempt to determine the degree of subjectivity—i.e., How partial or impartial is the point of view being expressed here? (This is often determined by looking at
screenshot courtesy of michael gamon
14 communicAtionS of the Acm | APriL 2009 | voL. 52 | no. 4
References:
Archives