The XRDS blog highlights a range of topics from conference coverage, to security
and privacy, to CS theory. Selected blog posts, edited for print, are featured in
every issue. Please visit xrds.acm.org/blog to read each post in its entirety. If you
are interested in joining as a student blogger, please contact us.
Cancer: From biology
to computer science
By Abdelrahman Hosny
Cancer is classified as a genetic disease caused by abnormal cell division that destroys body tissue.
Wait! Cell? Body tissue? Disease? You might be wondering: What does this have to do with computers?
In fact, cancer research has been at the heart of life
sciences for the past few decades. Since genetics play an
important role in most cancers, computational methods are
crucial in understanding the development of the disease, as
well as predicting the results of clinical trials for treatment.
That’s where computer science comes into action.
Before we define the computational problem, let’s review
some biology from high school and learn some facts
Our human body consists of trillions of cells.
Although each cell has the exact same DNA throughout
your body, every cell carries out its own function. DNA
is a long sequence of nucleotides preserved inside a
cell nucleus [ 1]. We represent the DNA as a sequence
of characters consisting of four letters: A, C, G and T.
(Adenine, Cytosine, Guanine, and Thymine, respectively).
For example, a portion of DNA might look like:
ACGTCCGGTAAATGGCGTA. In humans, the length of
DNA is about three billion nucleotides—that’s three billion
characters. If we imagine it stored on a text file containing
this sequence, the file size would be about 3 GBs. (Actually,
it turns out it is much larger than that, since we store
additional information too.)
Some parts of the DNA are copied from inside the
nucleus to the outside. The copied regions are ultimately
converted into something that makes the cell functional.
For example, cells of the stomach copy regions of the
DNA that will ultimately be converted into enzymes
that help digest food. Cells of the eyes copy regions of
the DNA that will ultimately be converted into products
that sense light. Regions that are copied from inside the
nucleus to the outside are called “genes.” For example, in
ACGTCCGGTAAATGGCGTA, we might have the first three
characters and the last three characters copied together
and concatenated to form the gene ACGGTA, which will
ultimately be converted into something useful.
In our three billion nucleotides of the DNA, there are
about 20,000 genes of different lengths. Suppose the
average length of a gene is 1,000 nucleotides, then all genes
will represent a length of 20,000,000 nucleotides; that is
less than 1 percent of the total length of the DNA ( 3 billion).
Does this mean most of the DNA is not useful because it
doesn’t encode genes? The answer is no! It’s extremely
useful, as we will see shortly.
At every single moment, cells divide and replicate inside
our bodies. Cell replication is also what we call growing up. As cells substitute dead cells, or just replicate to
carry out different functions, they form new, identical
daughter cells. This replication starts with obtaining two