The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
Follow us on Twitter at http://twitter.com/blogCACM
DOI: 10.1145/2160718.2160721
http://cacm.acm.org/blogs/blog-cacm
Likert-type scales,
statistical methods,
and effect sizes
Judy Robertson writes about researchers’ use of the wrong
statistical techniques to analyze attitude questionnaires.
ANOVA instead of nonparametric
counterparts? Kaptein, Nass, and Markopoulos3 suggest it is because HCI
researchers know that nonparametric
tests lack power. This means they are
worried the nonparametric tests will
fail to find a test where one exists. They
also suggest it is because there aren’t
handy nonparametric tests that let
you do analysis of factorial designs. So
what’s a researcher to do?
Judy Robertson
“stats: We’re Doing
it Wrong”
http://cacm.acm.org/
blogs/blog-cacm/107125
April 4, 2011
It is quite common for HCI or computer science education researchers to
use attitude questionnaires to examine people’s opinions of new software
or teaching interventions. These are
often on a Likert-type scale of “
strongly agree” to “strongly disagree.” And
the sad truth is that researchers typically use the wrong statistical techniques to analyze them. Kaptein, Nass,
and Markopoulos3 published a paper
in CHI last year that found that in the
previous year’s CHI proceedings, 45%
of the papers reported on Likert-type
data, but only 8% used nonparametric stats to do the analysis. Ninety-five
percent reported on small sample
sizes (under 50 people). This is statistically problematic even if it gets past
reviewers! Here’s why.
Likert-type scales give ordinal data.
That is, the data is ranked “strongly
agree” is usually better than “agree.”
However, it is not interval data. You
cannot say the distances between
“strongly agree” and “agree” would be
the same as “neutral” and “disagree,”
for example. People tend to think there
is a bigger difference between items at
the extremes of the scale than in the
middle (there is some evidence cited
in Kaptein et al.’s paper that this is the
case). For ordinal data, one should use
nonparametric statistical tests (so 92%
Robust modern statistical methods
It turns out that statisticians have
been busy in the last 40 years inventing improved tests that are not vulnerable to various problems that classic
parametric tests stumble across with
real-world data and which are also at
least as powerful as classic parametric
tests (Erceg-Hurn and Mirosevich1).
Why this is not mentioned in psychology textbooks is not clear to me. It
must be quite annoying for statisticians to have their research ignored! A
catch about modern robust statistical
methods is that you cannot use SPSS
to do them. You have to start messing around with extra packages in R
or SAS, which are slightly more frightening than SPSS, which itself is not a
model of usability. Erceg-Hurn and
Mirosevich1 and Kaptein, Nass, and
Markopoulos3 both describe the ANO-VA-type statistics, which are powerful
and usable in factorial designs and
works for nonparametric data.
A lot of interval data from behavioral research, such as reaction times,
does not have a normal distribution or
is heterscedastic (groups have unequal