higher, more stable rate of speech recognition for translation into text.
Speech-to-text conversion is far from perfect, however, as it is affected by factors such as audio devices, speaking style, and ambient noise. At the IBM Tokyo Research Laboratory, Takashi Saito, manager of the Accessibility Center, leads a group focused on correcting speech recognition errors. “This is tedious work,” says Saito. “First you listen to the audio to find errors in the text, then you delete the wrong characters, and then you input the correct characters. Our goal is to minimize the total time required for this process by simplifying the correction operations and minimizing the necessary keystrokes.”
Improving the quality of speech recognition is also playing a role in a proof-of-concept wheelchair at MIT. Finale Doshi, a graduate student in computer science, has designed a voice-activated wheelchair command system that uses machine learning to create and navigate a map of its environment. The wheel-chair-bound person issues verbal commands to the guidance system to move from point to point on the map. High quality, easily trainable speech recognition devices that operate reliably are the key to implementing the wheelchair.
“People who use wheelchairs often have a lot of shaking, even people who don’t have several degenerative conditions,” says Doshi. “It takes far less mental concentration to maneuver a wheelchair if you can issue commands verbally rather than manually. This is a very active area of research at MIT.”
In Seattle, Ladner and his students in the Department of Computer Science and Engineering at the University of Washington have their own active areas of research. Their MobileASL project combines enhanced video compression with a cell phone configured as a video phone—the video lens is on the same side of the device as the phone’s screen, which has two panels, one of which displays the remote translator while the other panel displays the cell phone user—to provide more effective communication between people who sign and remote translators who provide American Sign Language (ASL) and text relay services.
Ladner insists the raison d’être for all accessibility technology is to
optimize people’s lives. “Accessible technology is about accepting, for instance, that people use sign language and making the phone adapt to their needs. It’s not about a prosthesis or replacing something that’s taken million of years to evolve. Not everybody wants a cochlear implant, which requires major surgery and can cause problems with balance.”
In conjunction with the Rochester Institute of Technology, Ladner’s group is also working to establish a DHH (deaf or hard of hearing) Cyber-Community between universities to increase enrollment of students who are deaf or hard of hearing in science, engineering, and mathematics from undergraduate levels through doctoral programs.
Also related to student access, Ladner’s group is developing a tool that translates textbooks, so a person who is blind can fully understand the content. “Between Braille and optical character recognition, words in textbooks are fairly accessible, but the figures are still difficult. We’re replacing figures with textures through an automated process using our Tactile Graphics Assistant,” he says.
Ladner’s students are also contributing to the growing worldwide effort to improve Internet accessibility. “ Unfortunately, a lot of Web pages are not all that accessible for people who are blind or dyslexic,” he says. “Web designers use commercial development tools to make things look good, but don’t create a logical structure behind the page that’s navigable with a screen reader. Frequently, there’s also no alternative text inserted for figures.”
A pair of scientists at Carnegie Mellon University have created a computational model that can predict the brain activation patterns associated with concrete nouns, Science reports. Computer scientist Tom M. Mitchell and cognitive neuroscientist Marcel Just previously used functional magnetic resonance imaging (fMRI) to detect and pinpoint brain activity when a person thinks of a specific noun. With the fMRI data, the scientists created a computational model that enables a computer to determine what word a person is thinking of by analyzing brain scan data. In their latest research, Mitchell and Just used fMRI data to develop a computational model that can predict the brain activation patterns related to concrete nouns even if the computer did not possess fMRI data for a specific noun.
Mitchell and Just’s research could have applications in the study of autism, paranoid schizophrenia and other thought disorders, and semantic dementias such as Pick’s disease.
References:
Archives