A new system allows researchers to discover, reuse, cite, and
experiment upon any computational result that is published
with a Verifiable Result Identifier.
By Matan Gavish, David Donoho, and Amos Onn
Scientific researchers generate vast amounts of knowledge and archive it in scientific literature. The hundreds of thousands of scientific papers published annually represent, in a highly concentrated form, billions of dollars in research funds and millions of work hours by highly trained individuals.
This makes scientific literature a fantastic resource base—in principle. In practice, knowl-
edge is primarily discovered in the literature through Web searches and extracted from the
literature by labor-intensive methods. Here are a few tasks of considerable interest that—in
principle—we can answer using the knowledge available today:
• Show a table of effect sizes and p-values in all phase-three clinical trials
for Melanoma treatments published
• Name all image denoising algorithms ever used to remove white noise
from the famous “Barbara” image,
• List all of the classifiers applied to
the famous acute lymphoblastic leukemia dataset [ 1] along with their type- 1
and type- 2 error rates.
• Create a unified dataset containing all published whole-genome sequences identified with mutation in
the gene BRCA1.
• Randomly reassign treatment and
control labels to cases in published
clinical trial X and calculate effect size.
Repeat many times and create a histogram of the effect sizes. Perform this
for every clinical trial published in the
year 2003 and list the trial name and
histogram side by side.
Although we have all the required
information available in some digital
form, each of these tasks, and countless other similar tasks of significant
interest, currently requires a prohibitive amount of “manual” labor.
The existence of computer applica-
tions that can automate these tasks—
and more generally search, discover,
amalgamate, reuse, and experiment
upon published computational re-
sults—will profoundly change science
and provide an overwhelming justifi-
cation for its ongoing digitization. We
refer to these applications as “dream
applications.” None of them are feasi-
ble for computational results currently
published in the scientific literature.