Nnews
Science | DOI: 10.1145/2160718.2160723
Neil Savage
Automating scientific
Discovery
Computer scientists are teaching machines to run experiments, make
inferences from the data, and use the results to conduct new experiments.
THe PoWer of computers to juggle vast quantities of data has proved invaluable to science. In the early days, machines performed calculations that would have taken humans far too long to perform by hand.
More recently, data mining has found
relationships that were not known to
exist—noticing, for instance, a correlation between use of a painkiller
and incidence of heart attacks. Now
some computer scientists are working on the next logical step—teaching
machines to run experiments, make
inferences from the data, and use the
results to perform new experiments.
In essence, they wish to automate the
scientific process.
Photogra Ph coUrteSy of aberySt Wyth UniVer Sity
One team of researchers, from Cornell and Vanderbilt universities and
CFD Research Corporation, took a
significant step in that direction when
they reported last year that a program
of theirs had been able to solve a complex biological problem. They focused
on glycolysis, the metabolic process by
which cells—yeast, in this case—break
down sugars to produce energy. The
team fed their algorithm with experimental data about yeast metabolism,
Adam, a robotic system, operates biological
experiments on microtiter places.
along with theoretical models in the
form of sets of equations that could fit
the data.
The team seeded the program with
approximately 1,000 equations, all of
which had the correct mathematical
syntax but were otherwise random.
The computer changed and recombined the equations and ranked the
results according to which produced
answers that fit the data—an evolutionary technique that has been used
since the early 1990s. The key step,
explains Hod Lipson, associate professor of computing and information
science at Cornell, was to not only
rank how the equations fit the data
at any given point in the dataset, but
also at points where competing models disagreed.
“If you carefully plan an experiment to be the one that causes the
most disagreement between two
theories, that’s the most efficient experiment you can do,” says Lipson. By
finding the disagreements and measuring how much error the different
equations produced at those points,
the computer was able to refine the
theoretical models to find the one
set of equations that best fit the data.
In some cases, Lipson says, the technique can even develop a full dynamical model starting from scratch, without any prior knowledge.
Lipson and then-doctoral student
Michael Schmidt started developing
Eureqa, the algorithm the work was