figure 4. comparing process predictors on the law and hvannberg study8 169×169mm
( 72×72DPi).
Binomial
Binomial−GT
LNBzt
0.0 0.2 0.4 0.6 0.8 1.0
Discovery rate
11 16
56
ity researchers predict the progress
of the evaluation process through the
derived logit-normal geometric formula. 16 For the Law and Hvannberg
study8 a sample size of n= 56 participants is predicted for the 80% discovery target (see Figure 4), taking HCI
researchers way beyond the 10± 2 rule
or any other magic number suggested
in the literature.
0
10
20 30 40 50
Number of Sessions (sample size)
60
70
progress of usability studies. 16, 17 When
problem visibility varies, progress toward finding new problems would be
somewhat quicker in early sessions
but decelerate compared to the geometric model as sample size increases. The reason is that easy-to-discover
problems show up early in the study.
When discovered, they are then frequently rediscovered, taking the form
of the fat right tail of the frequency
distribution. These reoccurrences
increase the estimated average probability p but do not contribute to the
study, as progress is measured only in
terms of finding new problems. Moreover, with increased variance comes
more intractable problems (the fat
left tail), and revealing them requires
much more effort than the geometric
series model might predict.c
not So magical
Using the LNBzt model since 2008 to
examine many usability studies, I can
affirm that visibility variance is a fact
and that strong incompleteness usually occurs for datasets smaller than
n= 30 participants. Indeed, most studies I am aware of are much smaller,
with only a few after 2001 adjusting
for unseen events and not one accounting for visibility variance. The
meta study by Hwang and Salvendy6
carries both biasing factors—
incompleteness and visibility variance—
thus most likely greatly understating
required sample size.
Having seen data from usability
studies take a variety of shapes, I hesitate to say the LNBzt model is the last
word in sample-size estimation. My
concern is that the LNBzt model still
makes assumptions, and it is unclear
how they are satisfied for typical datasets “in the wild.” Proposing a single
number as the one-and-only solution
is even less justified, whether five, 10,
or 56.
improved Prediction
Looking to account for variance of
problem visibility, as well as unseen
events, I proposed, in 2009, a math-
ematical model I call the “zero-trun-
cated logit-normal binomial distri-
bution,” or LNBzt. 16 It views problem
visibility as a normally distributed
latent property with unknown mean
and variance, so the binomial param-
c Rephrasing this in terms of reliability engi-
neering, the geometric series model becomes
the discrete version of the exponential prob-
ability function, resulting in a stable hazard
function for a problem’s likelihood of being
discovered. With visibility variance, the hazard
function decreases over an increasing number
of sessions.
eter p can vary by a probability density
function—exactly what the encyclopedia article by Turner et al. 19 neglected.
Moreover, zero-truncation accounts
for the unknown number of never-dis-covered problems.
Figure 3 outlines the LNBzt model
fitted to the Law and Hvannberg dataset. Compared to the binomial model,
this distribution is more dispersed,
smoothly resembling the shape of the
observed data across the entire range.
It also estimates the number of not-yet-discovered problems at 74, compared
to eight with the binomial model and
20 with GT adjustment, suggesting the
study is only half complete.
The improved model fit can also be
shown with more rigor than through
visual inspection alone. Researchers
can use a simple Monte-Carlo proce-
dure to test for overdispersion.d, 17 A
more sophisticated analysis is based
on the method of maximum likeli-
hood (ML) estimation. Several ways
are available for comparing models
fitted by the ML method; one is the
Akaike Information Criterion (AIC). 2
The lower value for the LNBzt model
(AIC=286, see Figure 3) compared to
the binomial model (AIC=340, see Fig-
ure 1) confirms that LNBzt is a better fit
with the observed data.e
The LNBzt model also helps usabil-
d For a program and tutorial on the Monte-Carlo
test for overdispersion see http://schmettow.
info/Heterogeneity/
Problem Population
Besides accounting for variance, the
LNBzt approach has one remarkable
advantage over Lewis’s predictor for
required sample size: It allows for estimating the number of not-yet-discovered problems. The difference between
the two approaches—LNBzt vs. Lewis’s
adjustment—is that whereas Lewis’s
GT estimation first smooths the data
by adding virtual data points for undiscovered problems, then estimates
p, the LNBzt method first estimates the
parameters on the unmodified data,
then determines the most likely number of unobserved problems. 16
Recasting the goal from predicting
sample size to estimating the number
of remaining problems is not a wholly
new idea. In software inspection, the
so-called capture-recapture (CR) mod-