practices. The CUE participants have
done a great job of opening our eyes to
the possibilities for improvement.
Usability evaluation should no
longer be an artistic activity where
freedom prevails. To reap the
maximum benefit from usability
evaluations, standard procedures
should be defined and enforced,
just like doctors are obligated to
use standard procedures for most
The CUE studies show that our
simple assumption that we are all doing
the same thing when we do a test or an
evaluation is incorrect.
Special thanks are due to the more
than 140 usability specialists who have
participated in the CUE studies over
the past 20 years. Many participants
enjoyed the experience so much that
they participated in several studies.
1. http://www.dialogdesign.dk/ CUE.html
2. Molich, R. and Dumas, J. S. Comparative
usability evaluation – CUE- 4. Behaviour
& Information Technology 27, 3 (2008),
3. Molich, R., Hornbaek, K., Krug, S., Scott,
J., and Johnson, J. Recommendations on
recommendations. User Experiencee 7, 4
4. Molich, R., Chattratichart, J., Hinkle,
V., Jensen, J.J., Kirakowski, J., Sauro, J.,
Sharon, T., and Traynor, B. Rent a car
in just 0, 60, 240 or 1,217 seconds? –
Comparative usability measurement,
CUE- 8. Journal of Usability Studies 6, 1
5. Hertzum, M., Molich, R., and Jacobsen,
N.E. What you get is what you see:
Revisiting the evaluator effect in usability
tests. Behaviour & Information Technology
33, 2 (2013), 143–161.
6. Lindgaard, G. and Chattratichart,
J. Usability testing: What have we
overlooked? Proc. of the SIGCHI Conference
on Human Factors in Computing Systems.
ACM, Ne w York, 2007, 1415–1424.
Rolf Molich manages DialogDesign, a
tiny Danish usability consultancy. In 2014,
he received the UXPA Lifetime Achievement
Award for his work on the Comparative
Usability Evaluation project. He is vice
president of the UXQB, which develops and
maintains the CPUX certification. He is also
the co-inventor of the heuristic evaluation
of the usability test reports varied
dramatically. In CUE- 2, the size of the
nine reports varied from five pages
to 52 pages—a 10-times difference!
Some reports lacked positive findings,
executive summaries, and screenshots.
Others were complete with detailed
descriptions of the team’s methods and
definitions of terminology. By looking
through the different reports, we can
quickly pick out the attributes that
would make our reports more helpful
to our clients.
Takeaway: Make your usability test
reports usable for the target audience:
management and developers.
• Limit yourself to at most 25 pages,
possibly by leaving out some of the
less important issues. Remember: You
haven’t found all the issues anyway.
• Include a one-page executive
summary and place it at the beginning
of the report.
• Include positive findings.
• Include screenshots, possibly with
callouts, to make the report more
informative and attractive for its users.
Task design. In CUE- 2, nine teams
created 51 different tasks for the
same UI. We found each task to be
well designed and valid, but there was
scant agreement on which tasks were
critical. If each team used the same
best practices, then they should have
derived similar tasks from the test
scenario. But that isn’t what happened.
Instead, there was virtually no overlap.
It was as if each team thought the
interface was for a completely different
Takeaway: During task design, focus
on key tasks rather than secondary
tasks, however interesting they may be.
Few false alarms. We rigorously
evaluated each reported issue. We
paid particular attention to problems
reported by single teams only. We
found almost all described problems to
be reasonable and in accordance with
generally accepted advice on usable
design. The only exception was CUE- 3,
where a few of the less experienced
teams reported questionable issues.
Takeaway: If you have some
usability experience and report issues
based on experience and observation,
your reported problems should be
Some have raised criticism of the CUE
studies. For example:
• “A matching process was used
to identify the usability findings that
reported the same usability issues. This
process involved judgment and may be
While several experts reached
consensus about the matching of all
reported findings, we acknowledge
that others may group the findings
differently and that this may affect the
results—but not to the extent that it
will affect the general conclusions.
• “Participating teams had too much
freedom. You should have prescribed
more details of study, for example,
the test script and the test tasks.
This would have made results much
We deliberately decided not to tell
some of the world’s most recognized
usability-test experts how to do a
usability test. This helped us gain
additional insights. Also, in CUE- 9
all participating teams watched the
same videos but still reported different
• “Participants did not get paid. The
results of commercial studies would
have been much better.”
This does not match our experience;
also, none of our participants agreed
that this would have made a significant
The CUE studies raise some central
questions for the future research of
usability-testing techniques. How
can we construct tests that find the
important usability problems as
quickly as possible? And how can we
improve our practices so different
teams will consistently find the same
The practices of all of the teams
in the CUE studies needed review,
formalization, and a general
tightening up. In all probability,
since the teams were professional,
everyone can benefit from reviewing
the practices. We can use this
analysis to hold a mirror up to our
own work. These long-overdue
experiments provide valuable material
for sharpening individual usability
DOI: 10.1145/3278154 © 2018 ACM 1072-5520/18/11 $15.00