on data such as radio broadcasts and
newspapers to build a credible system, King says, without the expense
of hiring linguistic experts and professional voice artists. He has already
built a Swahili prototype, which he
says works pretty well.
King also has developed a system can take a small number of recordings of a particular individual’s
speech and apply them to a model
already trained with a much larger
dataset, and use that to generate new
speech that sounds like that individual. The system is undergoing clinical
trials in a U.K. hospital to see if it can
be a practical way of helping people
with amyotrophic lateral sclerosis,
who are expected to lose their ability
to speak as their disease progresses.
“This is not going to help them live
any longer, but for the time they do
live it could help make their quality of
life better,” he says.
Further Reading
Van den Oord, A., Dieleman, S.,
Zen, H., Simonyan, K., Vinyals, O.,
Graves, A., and Kalchbrenner, N.
WaveNet: A Generative Model for Raw Audio,
ArXiv, Cornell University Library, 2016
http://arxiv.org/pdf/1609.03499
King, S., and Karaiskos, V.
The Blizzard Challenge 2016, Blizzard
Challenge Workshop, Sept. 2016,
Cupertino, CA
http://www.festvox.org/blizzard/bc2016/
blizzard2016_overview_paper.pdf
Arnela, M., Dabbaghchian, S., Blandin, R.,
Guasch, O., Engwall, O., Van Hirtum, A., and
Pelorson, X.
Influence of vocal tract geometry
simplifications on the numerical simulation
of vowel sounds, Journal of the Acoustical
Society of America, 140, 2016
http://dx.doi.org/10.1121/1.4962488
Deng, L., Li, J., Huang, J-T., Yao, K., Yu, D.,
Seide, F., Seltzer, M., Zweig, G., He, X.,
Williams, J., Gong, Y, and Acero, A.
Recent advances in deep learning for
speech research at Microsoft, IEEE
International Conference on Acoustics,
Speech and Signal Processing, 2013
http://ieeexplore.ieee.org/xpls/abs_all.
jsp?arnumber=6639345
Simon King – Using Speech Synthesis to
Give Everyone Their Own Voice
https://www.youtube.com/watch?v=xzL-pxcpo-E
Neil Savage is a science and technology writer based in
Lowell, MA.
© 2017 ACM 0001-0782/17/3 $15.00
synthesis. At the moment, though, it
takes several hours of computing to
produce one second of speech, so it is
not immediately practical.
A Physical Model
Oriol Guasch, a physicist and mathematician at Ramon Llull University
in Barcelona, Spain, is also taking a
computationally intensive approach
to speech synthesis. He is working on
mathematically modeling the entire
human vocal tract. “We’d like to simulate the whole physical process, which
will, in the end, generate the final
sound,” he says.
To do that, he takes an MRI image of a person’s vocal tract as he
is pronouncing, say, the vowel “E.”
He then represents that geometry
of the vocal folds, soft palate, lips,
nose, and other parts with differential equations. Using that, he generates a computational mesh, a many-sided grid that approximates the
geometry. The process is not easy;
a desktop computer can generate a
mesh with three to four million elements in about three or four hours
to represent the short “A” sound, he
says. A sibilant “S,” though, requires
a computer with 1,000 processors to
run for a week to generate 45 million
elements. The added complexity of
that sound arises from the air flowing between the teeth and creating
turbulent eddies swirling in complex
patterns. Imagine, then, the time required to produce a whole word, let
alone a sentence.
Guasch sees his approach more as
an interesting computing challenge
than a practical attempt to create
speech. “The final goal is not just syn-
thesizing speech, it’s about reproduc-
ing the way the human body behaves,”
he says. “I believe when you have a
computational problem, it’s good to
face it from many different angles.”
The University of Edinburgh’s
King, on the other hand, is working
toward practical applications. He re-
cently received funding for a three-
year project, in conjunction with the
BBC World Service, to create text-to-
speech systems for languages that do
not have enough speakers to make de-
veloping a system a financially attrac-
tive process for companies. It should
be possible to use machine learning
ACM
Member
News
TRYING TO DETERMINE
WHAT IS COMPUTABLE
Santosh
Vempala,
Distinguished
Professor of
Computer
Science in the
College of
Computing at the Georgia
Institute of Technology (Georgia
Tech), earned his undergraduate
degree in computer science at
the Indian Institute of
Technology, New Delhi, India,
and his Ph.D. in Algorithms,
Combinatorics, and
Optimization from Carnegie
Mellon University in 1997. “What
I really wanted to study was
theory of computation; what is
computable and what is not, with
what amount of resources.”
After obtaining his
doctorate, Vempala became a
professor of mathematics at
the Massachusetts Institute of
Technology, a post he held for
almost 10 years before moving
to Georgia Tech in 2006. There,
Vempala served as the first
director (from 2006 to 2011) of
the Algorithms and Randomness
Center, a think tank dedicated
to exploring the theory of
computing and optimization.
His research focuses on
the intersection of algorithms,
randomness, and geometry. “The
relationship between algorithms
and geometry has been mutually
beneficial,” Vempala explains.
Initially it was about using
techniques mathematicians
had developed to work with
algorithms in new ways, but now
the questions and answers have
made a deep contribution to
the development of algorithmic
geometry, he says.
Vempala continues to be
fascinated by whether certain
problems have efficient
solutions. Some of his recent
research has been focused on
trying to understand how the
brain works, and the modeling of
its computational abilities.
In 2008, Vempala launched
an initiative called Computing
for Good, which develops
deployable computing solutions
for social problems like
inequality, homelessness, and
healthcare delivery, in areas
where resources are constrained.
— John Delaney