Society | DOI: 10.1145/1897852.1897860
Neil Savage
twitter as Medium
and Message
Researchers are mining Twitter’s vast flow of data to measure public sentiment,
follow political activity, and detect earthquakes and flu outbreaks.
TWitteR geneRateS a lot of noise. One hundred sixty million users send upward of 90 million messages per day, 140-character musings—studded with misspellings,
slang, and abbreviations—on what they
had for lunch, the current episode of
“Glee,” or a video of a monkey petting
a porcupine that you just have to watch.
Individually, these tweets range
from the inane to the arresting. But
taken together, they open a surprising
window onto the moods, thoughts, and
activities of society at large. Researchers are finding they can measure public
sentiment, follow political activity, even
spot earthquakes and flu outbreaks,
just by running the chatter through
algorithms that search for particular
words and pinpoint message origins.
“Social media give us an opportu-
nity we didn’t have until now to track
what everybody is saying about every-
thing,” says Filippo Menczer, associ-
ate director of the Center for Complex
Networks and Systems Research at In-
diana University. “It’s amazing.”
The results can be surprisingly accu-
rate. Aron Culotta, assistant professor
of computer science at Southeastern
Louisiana University, found that track-
ing a few flu-related keywords allowed
him to predict future flu outbreaks. He
used a simple keyword search to look
at 500 million messages sent from Sep-
tember 2009 to May 2010. Just finding
the word “flu” produced an 84% cor-
relation with statistics collected by the
U.S. Centers for Disease Control and
Prevention (CDC). Adding a few other
words, like “have” and “headache” in-
creased the agreement to 95%.
The CDC’s counts of what it terms
influenza-like illness are based on
doctors’ reports of specific symptoms
in their patients, so they’re probably
a more accurate measure of actual ill-
truthy shows how a tweet propagates, with retweets in blue and topic mentions in orange.
tweets that are sent back and forth between two twitter accounts appear as a thick blue bar.
ness than somebody tweeting “home
sick with flu.” But it can take a week
or two for the CDC to collect the data
and disseminate the information, by
twitter data may help
answer sociological
questions that are
otherwise hard to
approach, because
polling enough people
is too expensive and
time consuming.
which time the disease has almost cer-
tainly spread. Twitter reports, though
less precise, are available in real time,
and cost a lot less to collect. They could
draw health officials’ attention to an
outbreak in its earlier stages. “We’re
certainly not recommending that the
CDC stop tracking the flu the way they
do it now,” Culotta says. “It would be
nice to use this as a first-pass alarm.”
Google Flu Trends does something
similar. One potential point in Twit-
ter’s favor is that a tweet contains more
words, and therefore more clues to
meaning, than the three or four words
of a typical search engine query. And
training algorithms to classify mes-
sages—filtering out the tweets that talk
about flu shots or Bieber Fever—im-
proves the accuracy further.
There are other physical phenomena where Twitter can be an add-on to
existing monitoring methods. Air Twit-
Image Courtesy oF truthy.InDIana.eDu