Based on the average p=0.31 over several studies, Nielsen later concluded
that 15 users is typically enough to
find virtually all problems,13
recommending three smaller studies of
five participants each (finding 85% of
problems in each group) for driving
iterative design cycles. Unfortunately,
researchers, students, and usability professionals alike misconstrued
Nielsen’s recommendations and began to believe a simplified version of
the rule: Finding 85% of the problems
is enough, and five users usually suffice to reach that target.
Frequency
0 10 20 30 40
Schroeder18 reviewed an industrial
dataset, concluding that complex
modern applications require a much
larger sample size to reach a target of
80% discovery. In 2001, Caulton3 said
the probability of discovering a particular problem likely differs among
subgroups within a user population.
Likewise, Woolrych and Cockton22 presumed that heterogeneity in the sample of either participants or experts
could render Virzi’s formula biased.
This conclusion initiated the “five
users is (not) enough” debate, involving proponents and skeptics from
research and industry.a Spool and
(sample size or process size n). The bi-
nomial model is based on three funda-
mental assumptions that likewise are
relevant for the geometric series model:
Independence. Discovery trials are
stochastically independent;
Completeness. Observations are
complete, such that the total number
of problems is known, including those
not yet discovered; and
Homogeneity. The parameter p does
not vary, such that all problems are
equally likely to be discovered within
a study; I call the opposite of this as-
sumption “visibility variance.”
Observing that the average prob-
ability p varies across studies14 is a
strong argument against generalized
assertions like “X test participants suf-
fice to find y of problems.” A math-
ematical solution for dealing with
uncertainty regarding p devised by
Lewis9 suggested that estimating the
mean probability of discovery p from
the first few sessions of a study is help-
ful in predicting required sample size.
Lewis also realized it is not enough to
take only the average rate of success-
ful discovery events as an estimator
for p. The true total number of exist-
ing problems is typically unknown a
priori, thus violating the complete-
ness assumption. In incomplete stud-
ies, not-yet-discovered problems de-
crease estimated probability. Ignoring
incompleteness results in an optimis-
tic bias for the mean probability p. For
a small sample size, Lewis suggested
a correction term for the number of
undiscovered problems, or the Good-
Turing (GT) adjustment.
The debate has continued to ponder the mathematical foundation of
the geometric series model. In fact,
the formula is grounded in another
well-known model—binomial distribution—addressing the question of how
often an individual problem is discovered through a fixed number of trials
a For a comprehensive view of the debate see
Jeff Sauro’s Web site http://www.measuringus-
ability.com/blog/five-history.php
figure 1. Binomial model fit of the law and hvannberg study8 169×169mm (72×72DPi).
0 5 10 15
Binomial model
0 10 20 30 40
Frequency
Times Discovered
nlogLik = 168.906
AIC = 339.859
Empirical Seen: 88 binom prob = 0.138 Unseen: 8
figure 2. Binomial model fit with Good-turing adjustment of the law and hvannberg study8
169×169mm (72×72DPi).
Binomial model with
Good−Turing adjustment Empirical Seen: 88 binom prob = 0.094 Unseen: 20
However, when evaluating the prediction from small-size subsamples via
Monte-Carlo sampling, Lewis treated
the original studies as if they were
complete. Hence, he did not adjust the
baseline of total problem counts for
potentially undiscovered problems,
which is critical at small process size or
low effectiveness. For example, in Lewis’s MacErr dataset, a usability testing
study with 15 participants, about 50%
of problems (76 of 145) were discovered only once. This ratio indicates a
large number of problems with low visibility, so it is unlikely that all of them
would be discovered with a sample of
only 15 users. Hence, the dataset may
be incomplete.
0 5 10 15
Times Discovered
Moreover, Lewis’s approach was
still based on Virzi’s original formula,
including its homogeneity assumption. In 2008, I showed that homoge-