equations that were able to use these
ratios to predict connections between
variables over time. “This was one of
the biggest challenges we were able
to overcome,” Lipson says. “There are
infinite trivial equations and just a few
interesting ones.”
Like human scientists, the software
favors equations with the fewest terms.
“We want to find the simplest equation
powerful enough to predict the dynamics of the system,” Schmidt says.
applying artificial intelligence
Scientific data has become so voluminous and complex in many disciplines
today that scientists often don’t know
what to look for or even how to start analyzing it. So they are applying artificial
intelligence, via machine learning, to
giant data sets without precisely specifying in advance a desired outcome.
Unlike AI systems of the past, which
were usually driven by hard-coded expert rules, the idea now is to have the
software evolve its own rules primed
with an arbitrary starting point and a
few simple objectives.
Automating the discovery of natural laws marks a major step into a
realm that was previously inhabited
solely by humans.
Eric Horvitz, an AI specialist at Microsoft Research, says it’s only the beginning. “Computers will grow to become scientists in their own right, with
intuitions and computational variants
of fascination and curiosity,” says Horvitz. “They will have the ability to build
and test competing explanatory models of phenomena, and to consider the
likelihoods that each of the proposed
models is correct. And they will understand how to progress through a space
of inquiry, where they consider the best
evidence to gather next and the best
new experiments to perform.”
A major challenge facing the European Organization for Nuclear Research (CERN) is how to use the 40
terabytes of data that its Large Hadron
Collider is expected to produce every
day. Processing that amount of data
would be a challenge if scientists knew
exactly what to look for, but in fact they
can hardly imagine what truths might
be revealed if only the right tests are
performed. CERN researchers have
turned to Lipson and Schmidt for help
in finding a way to search for scientific
laws hidden in the data. “It could be a
killer app for them,” Schmidt says.
Indeed, Lipson and Schmidt have
received so many requests to apply
their techniques to other scientists’
data that they plan to turn their methodology and software into a freely distributed tool.
Josh Bongard, a computer scientist and roboticist at the University of
Vermont, says the Lipson/Schmidt approach is noteworthy for its ability to
find equations with very little assumed
in advance about their form. “That
gives the algorithm more free rein to
derive relationships that we might not
know about,” he says.
Bongard says earlier applications of
machine learning to discovery have not
scaled well, often working for simple
systems, such as a single pendulum,
but breaking down when applied to a
chaotic system like a double pendulum. Further evidence of the scalability of the Lipson/Schmidt algorithm is
its apparent ability to span different
domains, from mechanical systems to
very complex biological ones, he says.
The Lipson/Schmidt work takes
search beyond “mining”—where a specific thing is sought—to “discovery,”
where “you are not sure what you are
looking for, but you’ll know it when you
find it,” Bongard says. A key to making
that possible with large stores of complex data is having algorithms that are
able to evolve building blocks from
simple systems into successively more
complex models.
Such methods aim to complement
the efforts of scientists but not replace
them, as some critics have suggested.
“These algorithms help to bootstrap
science, to help us better investigate
the data and the models by acting like
an intelligent filter,” says Bongard.
Scientific research for decades has
followed a well-known path from data
collection (observation) to model formulation and prediction, to laws (expressed
as equations), and finally to a higher-level theoretical framework or interpretation of the laws. “We have shown we
can go directly from data to laws,” says
Schmidt, “so we are wondering if we can
go from laws to the higher theory.”
He and Lipson are now trying to automate that giant last step, but admit
they have little idea how to do it. Their
tentative first step uses a process of
analogy; a newly discovered but poorly
understood equation is compared with
similar equations in areas that are understood.
For example, they recently mined a
large quantity of metabolic data provided by Gurol Suel, assistant professor
at the University of Texas Southwestern
Medical Center. The algorithm came
up with two “very simple, very elegant”
invariants—so far unpublished—that
are able to accurately predict new data.
But neither they nor Suel has any idea
what the invariants mean, Lipson says.
“So what we are doing now is trying to
automate the interpretation stage, by
saying, ‘Here’s what we know about the
system, here’s the textbook biology;
can you explain the new equations in
terms of the old equations?’ ”
Lipson says the ultimate challenge
may lie in dealing with laws so complicated they defy human understanding.
Then, automation of the interpretation
phase would be extremely difficult.
“What if it’s like trying to explain
Shakespeare to a dog?” he asks.
“The exciting thing to me is that
we might be able to find the laws at
all,” says Lipson. “Then we may have
to write a program that can take a very
complicated concept and break it down
so humans can understand it. That’s a
new challenge for AI.”
Further Reading
Lipson, H. and Schmidt, M.
Distilling free-form laws from experimental
data. Science 234 (Apr. 3, 2009), 81–85.
Waltz, D. and Buchanan, B.
Automating science. Science 234 (Apr. 3,
2009), 43–44.
King, R., Whelan, K., Jones, F., Reiser, P., Bryant,
C., Muggleton, S., Kell, D., Oliver, S.
Functional genomic hypothesis generation
and experimentation by a robot scientist.
Nature 427, 6971 (Jan. 15, 2004), 247–252.
Bongard, J. and Lipson, H.
Automated reverse engineering of nonlinear
dynamical systems. In Proceedings of
the National Academy of Sciences 104, 24
(June 6, 2007), 9943–9948.
Koza, J.
Genetic Programming: On the Programming
of Computers by Means of Natural Selection.
MIT Press, Cambridge, MA, 1992.
Gary Anthes is a technology writer and editor based in
arlington, Va.