INTERACTIONS.ACM.ORG 82 INTERACTIONS NOVEMBER–DECEMBER2018
FORUM EVALUATION AND USABILITY
This forum addresses conceptual, methodological, and professional issues that arise
in the UX field’s continuing effort to contribute robust information about users to product
planning and design. — David Siegel and Susan Dray, Editors
• to allow participating experienced
professionals to further increase their
skills.
Early on, I also formulated some
non-purposes:
• to pick a winner
• to make a profit.
OVERVIEW OF CUE-STUDIES
Table 1 provides an overview of the
CUE studies.
CUE- 1 to CUE- 6 were classic CUE
studies; they all studied usability
evaluation methods, in particular
usability test, expert review, and
heuristic inspection.
In CUE- 7, six experts made
recommendations for fixing specified
real usability problems on IKEA’s
website. The purpose was to derive
a set of usable recommendations for
writing recommendations [ 3].
CUE- 8 focused on the measurement
of usability, in particular task time. In
this study, the usability test tasks were
prescribed [ 4].
CUE- 9 focused on the evaluator
effect. Thirty-five participants in the
U. S. and in Germany independently
analyzed the same five videos of
unmoderated usability test sessions of
a truck-rental website. In this study,
all test tasks and test participants
were the same. The study confirmed
the evaluator effect: Different
moderators report somewhat different
issues even though they watched the
same videos [ 5].
CUE- 10 studied test moderation.
Sixteen moderators, mostly
professionals with considerable
experience, video-recorded themselves
and their test participants during
• 15 or so professional UX teams.
Directions:
• Ask the teams to evaluate the
usability of the website independently
and simultaneously.
• Compare the anonymous usability
test reports from the teams in a
one-day workshop where all teams
participate.
• Marvel at the substantial
differences in approach, reporting, and
results.
• Repeat over 10-plus studies.
Since 1998, this has been the
recipe for 10 successful Comparative
Usability Evaluation (CUE) studies
with more than 140 participating
teams. These studies have produced
unique insights into how experienced
UX professionals do usability testing.
In a CUE study, teams
simultaneously and independently
evaluate the same product. All of
the teams are given the same test
scenario and objectives for the same
interface, most often a website. Each
team then conducts a study using their
organization’s standard procedures
and techniques, for example, usability
testing or heuristic evaluation. After
each team has completed its study, it
submits its results in the form of an
anonymous report. In a subsequent
one-day workshop, all participants
meet and discuss the reports, the
differences between them, the reasons
for the differences, and how to improve
the test process. The differences are
often stunning.
Participation in a CUE study is
mostly driven by curiosity and an
eagerness to learn. Participation is
mostly free except that participants are
asked to cover direct expenses, such as
room rental. Participating teams and I
are not compensated financially, except
that CUE- 5 and CUE- 6 were organized
as for-profit workshops for which I
received a fee.
Most of the teams have been from
the U.S., but a considerable number of
German, English, and Danish teams
have also participated. Most of the
anonymous CUE-test reports are freely
available [ 1].
PURPOSES AND
NON-PURPOSES
The key purposes of all CUE studies
have been:
• to survey the state of the art
within professional usability testing of
websites
• to investigate the reproducibility of
usability test results
Rolf Molich, DialogDesign
Are Usability Evaluations
Reproducible?
Insights
→ The total number of usability issues
in modern, complex websites is
much larger than you can hope to
find in a usability test.
→ Five users are by far not enough to
find 75 percent or even 25 percent
of the usability problems in a complex
website like a car-rental website.