Large genomic databases with interactive access
require new, layered abstractions, including
separating “evidence” from “inference.”
By ViNeet BafNa, aLiN DeutsCH, aNDRe W HeiBeRG,
CHRistos koZaNitis, LuCiLa oHNo-MaCHaDo,
aND GeoRGe VaRGHese
huMAns ARe A product of nature and nurture, meaning
our phenotype (the composite of all outward, measurable,
characteristics, including our health parameters) is a
function of two things: our genotype (the DNA program in
all cells) and the environment (all inputs to a human, like
food and medicine). This arrangement is analogous to
how the output of a program (such as a search engine)
is a function of both the program and
the input (keywords typed by a user).
Using the same input with a different
program (such as Google search vs.
Bing) can result in different output.
In this analogy, the role of the medical
professional is to provide information
that is “diagnostic” (such as, “Is there
a bug in the program based on observed output?”), “prognostic” (such
as, “Can output/outcome be predicted, given specific inputs, like diet?”),
or “therapeutic” (such as, “Can a spe-
Making genomics interactive is potentially
transformative, similar to the shift from
batch processing to time sharing.
analogous to internet layering, genome
processing can be layered into
an instrument layer, an evidence layer,
and an inference layer.
a declarative query language we call
GQL enables automatic optimization
and provenance and privacy checks more
readily than procedural alternatives
cific input, like a drug, lead to the desired output?”). Also, the electronic
medical record (EMR) of a patient can
be viewed as an archive of previously
acquired inputs and outputs.
Unlike computers, the human program is largely hidden. Hence, traditional medicine is “depersonalized,”
with doctors providing treatment by
comparing the patient’s phenotype
(symptoms) against empirical observations of outputs from a large number of individuals. Limited custom-ization is based on coarse classes,
like “race.” All this changed with the
sequencing of the human genome in
early 2000 and the subsequent drop
in costs from hundreds of millions
of dollars to $1,000 on small desktop
sequencing machines. The ability to
cheaply read the program of each human underlies the great promise of
personalized medicine, or treatment
based on symptoms and the patient’s
distinctive DNA program.
We frame this point with a clas-