news
Technology | DOI: 10.1145/2133806.2133812
Tom Geller
talking to machines
Voice recognition programs like Siri are now capable
of understanding spoken commands, recognizing a conversation’s
context, and answering questions in a personable manner.
When aPPLe in TeGraTed Siri into the iOS oper- ating system last Octo- ber, it spurred iPhone owners to start talking to their phones as well as through
them. The program, which converts
spoken commands such as “Schedule
dinner with Lisa at 6 tonight” into calendar appointments, Web searches,
and the like, is the most widely distributed example of a cognitive assistant to date. More than four million
iPhone 4S’s featuring Siri were sold
during its first weekend. Although users might see it as simple speech recognition, its abilities go far beyond
simple transcription.
Siri represents an important moment when voice recognition, information management, artificial intelligence, task fulfillment, and user
interface marry in a way the general
public finds usable and productive.
As Wolfram Research executive director Luc Barthelet says, “The news
about Siri is that it works. People have
tried to get computers to answer questions conversationally for at least 15
years, but only now has the technology reached a threshold where people
overall like it.” The iPhone’s popularity also gives intelligent software assistants wider exposure than they would
get otherwise. Roger K. Moore, editor
in chief of the journal Computer Speech
and Language, points out that “the field
of research hasn’t changed dramatically. What’s new is that Siri’s brought
several complementary technologies
together. Our business has been going
for many years. Only now, with Siri, everybody knows about it.”
Siri can answer a wide variety of spoken questions in a conversational manner even in difficult conditions but, like its human inventors, it has yet to solve to the P versus NP problem.
executing commands
There is a long road between the spo-
ken command and its fulfillment,
though. The first step in the process
is to convert the audio of speech into
meaning. The two main applications
of speech recognition—dictation and
command recognition—have forced
researchers to pursue parallel methods
that balance vocabulary, accent, and
context needs.