Viewpoint
A comparable event occurred at the 2006 International Genetically Engineered Machines (iGEM)
competition, which had brought together graduate
students from 15 countries majoring in biology,
computer science, and electrical engineering
( parts2.mit.edu/wiki/index.php/Main_Page). The
apparently innocuous test of changing a substance’s
fragrance by adding special ingredients has enormous
practical implications, most notably in environmental remediation and pharmaceutical engineering.
Imagine that the ingredient being added to the dish
contains a dangerous chemical. The experiment
allows investigators to detect the presence of the substance simply by exposing it to bacteria and having
humans, animals, or even instrumentation detect the
chemical changes generating the fragrance.
What does this have to do with computer science?
The question and its answer are at the heart of synthetic biology. Its objective is to reengineer cells by
changing or supplementing their DNA so the ensuing cell products are sensitive to given substances
and can indicate and eventually eradicate their presence. For example, synthetic biology could potentially engineer harmless bacteria capable of detecting
and absorbing oil spills or breaking down carbon
dioxide in the atmosphere.
The entity that associates synthetic biology to
computer science is DNA, which can be viewed as a
program that remains static or dormant in a computer-like memory. Only when it is executed by
processors—the equivalent of interpreters and hardware—does its dynamic behavior come to life.
From its origins in the early 1980s, bioinformatics, combining computer science and biology, has
dealt mostly with the static properties of DNA and
its products, like RNA and proteins. With the
automation of DNA sequencing in the mid-1980s,
the immediate goal was to obtain sequences of letters—A, C, G, and T—identifying the basic
nucleotides that characterize all living matter, from
bacteria to humans. The project of sequencing a
variety of genomes is still formidable; a recent press
release reported “The theoretical price of having
one’s personal genome sequenced just fell from the
prohibitive $20 million to about $2.2 million, and
the goal is to reduce the amount further—to about
$1,000—to make individualized prevention and
treatment realistic” [ 7].
Following the principles of Darwinian evolution,
bioinformatics specialists have scrutinized similarities
among the static DNA of various species. In fact,
nearly all research in similarities to date has been
done at the static level, without great concern for the
dynamics triggered when DNA is processed by
actors, like polymerases and ribosomes. Such actors
are essentially the nanobiological machinery that
processes DNA to produce proteins—the building
blocks of life.
Current efforts in bioinformatics also seek to
determine protein structure and function. Most
research has concentrated on identifying the stable
3D shape of a static molecule, even though protein
molecules have degrees of flexibility that are relevant
in determining a molecule’s function. The concern
for static sequences and 3D structures is still amply
justified, since studying the dynamics of DNA and
protein interaction is nearly impossible without a
thorough study of their static counterparts.
Protein function is specified through informal
natural language sentences that describe the role of a
protein in a living cell. This description must still be
complemented with formal specifications by, for
example, indicating the protein’s role in a network of
protein interactions. Computer scientists are needed
to design these specifications.
Systems biology studies the dynamic properties of
the interactions between DNA and its products. Even
if this new field is viewed as a branch of bioinformatics, it is already an area of significant interest to biologists, computer scientists, control engineers, and
mathematicians. In dealing with static DNA and its
products, including protein sequences, fundamental
computer science algorithms operate on strings of symbols. They search for approximate patterns in very long
sequences, compare multiple sequences, and combine
overlapping sequences. Optimization is the objective of
the approximate pattern-matching of sequences; the
algorithms are designed to minimize the cost of
abstractly transforming one sequence into another.