the Communications Web site, cacm.acm.org, features 13 bloggers in the BLOG@CaCm community. in each issue of Communications, we’ll publish excerpts from selected posts, plus readers’ comments.
Four years ago when I bought my first in-car Global Positioning System (GPS) unit, it felt like a taste of the future. The unit knew where I was, and regardless of how many wrong turns I made, it could tell me how to get where I wanted to go. It was the ultimate adaptive interface: No matter where I started, it created a customized route that would lead me to my destination.
Alas, my first GPS unit met an untimely end in a theft involving a dark night, an empty street, and a smashed window.
My new GPS, a Garmin nüvi 850, comes with a cool new feature: speech-activated controls.
Speech recognition brings a new dimension to the in-car human-computer interface. When you’re driving, you’re effectively partially blind and have no hands. Being able to talk to the computer and instruct it using nothing but your voice is amazingly
empowering, and makes me excited about the future of voice-based interfaces.
The nüvi’s interface is simple and well designed. There’s a wireless, but-ton-activated microphone that you mount to your steering wheel. When you activate the mic, a little icon appears on the GPS screen to indicate that it’s listening, and the GPS plays a short “I’m listening” tone. You can speak the names of any buttons that appear on the screen or one of the always-active global commands (e.g., “main menu,” “music player,” or “go home”). Musical tones indicate whether the GPS has successfully interpreted your utterance. If it recognized your command, it takes you to the next screen and verbally prompts you for the next piece of information (e.g., the street address of your destination). Most of the common GPS functionality can be activated via spoken confirmations without even looking at the screen.
Lists (e.g., of restaurant names) are annotated with numbers so you
only have to speak the number of the item you want from the list. However, it also seems to correctly recognize the spoken version of anything in the list, even if it’s not displayed on the current screen (e.g., the name of an artist in the music player).
In my tests it’s been surprisingly accurate at interpreting my speech, despite the generally noisy environment on the road.
What has surprised me the most about this interface is that the voice-based control is so enjoyable and fast that I don’t use the touch screen anymore. Speech recognition, which had been in the realm of artifical intelligence for decades, has finally matured to the point where it’s now reliable enough for use in consumer devices.
Part of the power of the speech-activated user interface comes from the ability to jump around in the interface by spoken word. Instead of having to navigate through several different screens by clicking buttons, you can jump straight to the desired screen by speaking its name. It’s reminiscent of the difference between graphic user interfaces (GUIs) and command lines; GUIs are easier to learn, but once you master them, command lines offer more efficiency and power. As is the case with command lines, it takes some experimentation to discover what commands are available when; I’m still learning about my GPS and how to control it more effectively.
Kudos, Garmin, you’ve done a great
References:
Archives