A comparable event occurred at the 2006 International Genetically Engineered Machines (iGEM) competition, which had brought together graduate students from 15 countries majoring in biology, computer science, and electrical engineering ( parts2.mit.edu/wiki/index.php/Main_Page). The apparently innocuous test of changing a substance’s fragrance by adding special ingredients has enormous practical implications, most notably in environmental remediation and pharmaceutical engineering. Imagine that the ingredient being added to the dish contains a dangerous chemical. The experiment allows investigators to detect the presence of the substance simply by exposing it to bacteria and having humans, animals, or even instrumentation detect the chemical changes generating the fragrance.
What does this have to do with computer science? The question and its answer are at the heart of synthetic biology. Its objective is to reengineer cells by changing or supplementing their DNA so the ensuing cell products are sensitive to given substances and can indicate and eventually eradicate their presence. For example, synthetic biology could potentially engineer harmless bacteria capable of detecting and absorbing oil spills or breaking down carbon dioxide in the atmosphere.
The entity that associates synthetic biology to computer science is DNA, which can be viewed as a program that remains static or dormant in a computer-like memory. Only when it is executed by processors—the equivalent of interpreters and hardware—does its dynamic behavior come to life.
From its origins in the early 1980s, bioinformatics, combining computer science and biology, has dealt mostly with the static properties of DNA and its products, like RNA and proteins. With the automation of DNA sequencing in the mid-1980s, the immediate goal was to obtain sequences of letters—A, C, G, and T—identifying the basic nucleotides that characterize all living matter, from bacteria to humans. The project of sequencing a variety of genomes is still formidable; a recent press release reported “The theoretical price of having one’s personal genome sequenced just fell from the prohibitive $20 million to about $2.2 million, and
the goal is to reduce the amount further—to about $1,000—to make individualized prevention and treatment realistic” [ 7].
Following the principles of Darwinian evolution, bioinformatics specialists have scrutinized similarities among the static DNA of various species. In fact, nearly all research in similarities to date has been done at the static level, without great concern for the dynamics triggered when DNA is processed by actors, like polymerases and ribosomes. Such actors are essentially the nanobiological machinery that processes DNA to produce proteins—the building blocks of life.
Current efforts in bioinformatics also seek to determine protein structure and function. Most research has concentrated on identifying the stable 3D shape of a static molecule, even though protein molecules have degrees of flexibility that are relevant in determining a molecule’s function. The concern for static sequences and 3D structures is still amply justified, since studying the dynamics of DNA and protein interaction is nearly impossible without a thorough study of their static counterparts.
Protein function is specified through informal natural language sentences that describe the role of a protein in a living cell. This description must still be complemented with formal specifications by, for example, indicating the protein’s role in a network of protein interactions. Computer scientists are needed to design these specifications.
Systems biology studies the dynamic properties of the interactions between DNA and its products. Even if this new field is viewed as a branch of bioinformatics, it is already an area of significant interest to biologists, computer scientists, control engineers, and mathematicians. In dealing with static DNA and its products, including protein sequences, fundamental computer science algorithms operate on strings of symbols. They search for approximate patterns in very long sequences, compare multiple sequences, and combine overlapping sequences. Optimization is the objective of the approximate pattern-matching of sequences; the algorithms are designed to minimize the cost of abstractly transforming one sequence into another.
References:
Archives