the Communications Web site, cacm.acm.org,
features 13 bloggers in the BLOG@CaCm
community. in each issue of Communications,
we’ll publish excerpts from selected posts,
plus readers’ comments.
DOI:10.1145/1516046.1516072
cacm.acm.org/blogs/blog-cacm
speech-activated
user interfaces and
Climbing mt. exascale
Tessa Lau discusses why she doesn’t use the touch screen
on her in-car GPS unit anymore and Daniel Reed considers
the future of exascale computing.
from tessa Lau’s
“hello, Computer”
Four years ago when I
bought my first in-car
Global Positioning System (GPS) unit, it felt
like a taste of the future. The unit
knew where I was, and regardless
of how many wrong turns I made, it
could tell me how to get where I wanted to go. It was the ultimate adaptive
interface: No matter where I started,
it created a customized route that
would lead me to my destination.
Alas, my first GPS unit met an untimely end in a theft involving a dark
night, an empty street, and a smashed
window.
My new GPS, a Garmin nüvi 850,
comes with a cool new feature:
speech-activated controls.
Speech recognition brings a new
dimension to the in-car human-computer interface. When you’re driving,
you’re effectively partially blind and
have no hands. Being able to talk to
the computer and instruct it using
nothing but your voice is amazingly
empowering, and makes me excited
about the future of voice-based interfaces.
The nüvi’s interface is simple and
well designed. There’s a wireless, but-ton-activated microphone that you
mount to your steering wheel. When
you activate the mic, a little icon appears on the GPS screen to indicate
that it’s listening, and the GPS plays
a short “I’m listening” tone. You can
speak the names of any buttons that
appear on the screen or one of the
always-active global commands (e.g.,
“main menu,” “music player,” or
“go home”). Musical tones indicate
whether the GPS has successfully interpreted your utterance. If it recognized your command, it takes you to
the next screen and verbally prompts
you for the next piece of information
(e.g., the street address of your destination). Most of the common GPS
functionality can be activated via spoken confirmations without even looking at the screen.
Lists (e.g., of restaurant names)
are annotated with numbers so you
only have to speak the number of the
item you want from the list. However,
it also seems to correctly recognize
the spoken version of anything in the
list, even if it’s not displayed on the
current screen (e.g., the name of an
artist in the music player).
In my tests it’s been surprisingly
accurate at interpreting my speech,
despite the generally noisy environment on the road.
What has surprised me the most
about this interface is that the voice-based control is so enjoyable and
fast that I don’t use the touch screen
anymore. Speech recognition, which
had been in the realm of artifical intelligence for decades, has finally
matured to the point where it’s now
reliable enough for use in consumer
devices.
Part of the power of the speech-activated user interface comes from
the ability to jump around in the interface by spoken word. Instead of
having to navigate through several
different screens by clicking buttons, you can jump straight to the
desired screen by speaking its name.
It’s reminiscent of the difference between graphic user interfaces (GUIs)
and command lines; GUIs are easier
to learn, but once you master them,
command lines offer more efficiency
and power. As is the case with command lines, it takes some experimentation to discover what commands
are available when; I’m still learning
about my GPS and how to control it
more effectively.
Kudos, Garmin, you’ve done a great