we still do not have an accurate predictive model of users’ transition from recognition-based tracing to recall-based gesturing.
Modern human motor control and learning theories have
made great progress in the last decades. 34, 41, 42 Leveraging
findings and insight from that literature to make specific
gesture keyboard design and analysis decisions offer opportunities for deeper research.
Particular lacking to date is a rigorous quantification of
gesture space density as a function of the keyboard layout
and the size of the lexicon. Without such a model it is difficult to fully understand error rate as a function of speed-accuracy trade-off. “Sloppy” gestures tended to be faster
but also more error-prone. Exact or statistical modeling
of gesture keyboard’s speed-accuracy trade-off incorporating human control behavior is another important future
Also critically lacking in the literature to date is large-scale data logging and analysis of word-gesture keyboards in
everyday use, which may provide not only a more complete
understanding of user behavior but also data for large-scale
machine learning of gesture keyboard algorithms and their
parameters. Such work of course requires significant infrastructure and privacy preservation efforts.
The core technology of a word-gesture keyboard can
conceivably be improved by using larger and long-span language models that take into account several previous words
of context when they compute the language model’s prior
belief in a word candidate. However, the trade-off between
the language model’s size and efficacy remains an open
question in the case of word-gesture keyboards. The spatial
model of gesture keyboards should also be more broadly
explored and tested. We have only explored a certain type
of simple and efficient local (location) and global (shape)
features for gesture keyboard recognition, but a variety of
features can be invented in the future, particularly given the
non-stop improvements in processing speed and memory
capacity of mobile devices.
Gesture keyboards can also be used with other modali-ties. For example, if gestures can be effectively delimited
they may be incorporated into eye-tracking systems or 3D
full-body motion tracking systems, such as those used in
Microsoft game products. Gesture keyboards can also be
potentially integrated with speech input. In fact, there is
already an experimental system that simulates the effects of
a word-gesture keyboard combined with speech. 19
We have alluded to the keyboard layout issue several
times in this paper. For ease of adoption, Qwerty is a necessary default layout. It is very clear that the efficiency of
word-gesture keyboards can be significantly improved if
the keyboard layout is optimized. Qwerty is inefficient for
word-gesture keyboarding because the gesture strokes frequently zigzag between the left and right over a relatively
long distance. For this reason, we would want the keyboard to be arranged so that frequent letter-key pairs tend
to be closer to each other. The layout of a gesture keyboard
can also be optimized toward ambiguity minimization, so
that word gestures are more distinct from one another.
Not only would this make gesture keyboards more error-tolerant, but also facilitate ease to efficiency progression
since gestures defined on such a layout should be more
distinguishable. How to optimize the layout toward multiple objectives is another open question. 2 Even more
challenging is how to get users realize the benefits of an
optimized layout and quickly learn them in perhaps a playful fashion. 23
This article is a synthesis of a set of previous publications. 15–17, 21–23, 45–47, 49–51 We thank IBM Research, Linköping
University, and many friends and colleagues for
their years of support and contribution. Without their
appreciation of innovation for long-term impact this
sustained research program would not have been possible.
We thank Stu Card and Bill Buxton for their insightful
comments and suggestions that have greatly improved this
article. We also thank Kelly Tierney of IBM Corporate Design
who rendered the illustration in Figure 2.
1. accot, J., Zhai, s. More than dotting
the i’s – foundations for crossing-based interfaces. In Proceedings
of CHI 2002: ACM Conference
on Human Factors in Computing
Systems, CHI Letters 4, 1 (2002),
2. Bi, X., smith, B.a., Zhai, s. Multilingual
touchscreen keyboard design and
optimization. Hum. Comput. Interact.
(2012), to appear (available online at
3. Buxton, W., Human Input to
Computer Systems: Theories,
Techniques and Technology, book
manusript, available at http://www.
4. Buxton, W. Chunking and phrasing
and the design of human-computer
dislogues. In Proceedings of IFIP
World Computer Congress (Dublin,
Ireland, 1986), 475–480.
5. Buxton, William (2005). piloting
through the Maze. Interactions
Magazine. 12( 6), november +
6. Card, s., Moran, t., newell, a. The
Psychology of Human-Computer
Interaction, lawrence, erlbaum
associates, hillsdale, nJ, 1983.
7. Cao, X., Zhai, s. Modeling human
performance of pen stroke gestures.
In Proceedings the ACM CHI
conference on Human factors in
computing systems (2007), aCM,
8. Chen, s.f., goodman, J. an empirical
study of smoothing techniques for
language modeling. In Proceedings
of the 34th Annual Meeting on
Association for Computational
Linguistics (1996), association for
Computational linguistics, santa
Cruz, Ca, 310–318.
9. Cooper, W.e., ed. Cognitive Aspects of
Skilled Typewriting, springer-Verlag,
new york, 1983.
10. David, p.a. Clio and the economics of
Q Werty. Am. Econ. Rev. 75 (1985)
11. fitts, p.M. the information capacity
of the human motor system
in controlling the amplitude of
movement. J. Exp. Psychol. 47, 6
12. getschow, C.o., rosen, M.J.,
a systematic approach to design
a minimum distance alphabetical
keyboard. In Proceedings of RESNA
(Rehabilitation Engineering Society
of North America) 9th Annual
Conference (Minneapolis, Mn, 1986),