sic example: The blood-thinner drug
Warfarin is widely prescribed to prevent blood clots. Dosage is critical;
too high and the patient can bleed to
death, too low and the drug might not
prevent life-threatening blood clots.
Often, the right dosage is established
through multiple visits to the clinic
and regular testing. However, recent
reports16 suggest that knowledge of
the patient’s genetic program can
help establish the right dosage. We
outline this approach (genetic association and discovery workflow) in
three steps:
Collect samples. Collect a sample
of affected and “genetically matched”
control individuals; then sample DNA
and catalog variations;
Identify variations. Identify and report variations that co-segregate, or
correlate, with the affected/control
status of the individual; and
Follow up with studies. Follow up
on the genetic basis of the correlation
through costly studies and experiments in animal models and clinical
trials; then transfer knowledge to the
clinic.
Even with its success, the discov-
ery approach involves complications:
First, studies are resource-intensive,
requiring identifying and sequenc-
ing large cohorts of individuals with
and without a disease. Second, it is
unclear how to apply study results to
a specific individual, especially one
genetically different from the inves-
tigated cohort. Finally, data reuse is
difficult; significant computation is
needed to dig out data from a previous
study, and much care is required to
reuse it. We contrast “discovery work
flow” with “personalized medicine.”
Here, a physician treating individual
A may query a database for treatments
suitable for patients with genetic vari-
ations similar to those of A or query
for patients genetically similar to A for
treatments and dosages that worked
well for these patients.
figure 1. universal sequencing, discovery, and personalized medicine.
Assume every individual is sequenced at birth. in discovery, clinical geneticists logically
select a subset of individuals with a specific phenotype (such as disease) and another
without the phenotype, then identify genetic determinants for the phenotype. By contrast,
in personalized medicine medical professionals retrieve the medical records of all patients
genetically similar to a sick patient S.
gs, Ms
g1
g2
gn
M
Personalized
Medicine
M1
M2
Mn
discovery
workflow
for a patient, a medical team might
identify a collection of individuals
genetically similar to the patient and
on a Warfarin regimen; query their
genomes and EMRs for genetic variation in candidate genes and Warfarin
dosage, respectively; and choose the
appropriate dosage based on the patient’s specific genetic program. The
ability to logically select from a very
large database of individuals using
the phenotype as a key removes the
first problem with the discovery workflow. Using genetic variations specific
to individual A as a key to return treatments (that work well for such variations) addresses the second problem.
Finally, if the accompanying software
system has good abstractions, then
the third problem (computational
burden to reuse data) is greatly eased.
Here we focus on key software abstractions for genomics, suggesting that
like other CS areas (such as VLSI/sys-tems), software abstractions will enable genomic medicine.
We start with basic genetics using
programming metaphors, then describe trends in sequencing and how
genetic variations are called today and
outline our vision for a vast genomic
database built in layers; the key idea is
the separation of “evidence” and “
inference.” We then propose a language
for specifying genome queries and
end by outlining research directions
for other areas of computer science to
further this vision.
We limit our scope to genomics,
ignoring dynamic aspects of genomic
analysis (such as transcriptomics, pro-teomics expression, and networks).
Genomic information is traditionally analyzed using two complementary paradigms: First, in comparative
genomics, where different species
are compared, most regions are dissimilar, and the conserved regions are
functionally interesting. 6, 7 The second
is population genomics, where genomes from a single population are
compared under the baseline hypothesis that the genomes are identical, and
it is the variations that define phenotypes and are functionally interesting.
We focus on population genomics and
its application to personalized medicine and do not discuss specific sequencing technologies (such as strobe
sequencing vs. color space encoding).