identical copies of the DNA that will be contained in the
nucleus of a separate cell. However during replication,
some nucleotides might get mutated in the DNA copying
process. In general, mutations are generally fine as long
as the copy process occurs in micro-seconds. It is expected mutations can happen during this high-speed copying
But what about harmful mutations?
If it happens that a mutation occurred in the part of
the DNA that encodes genes, this might be a problem
leading genes to be either over-expressed or under-expressed. It’s very likely to have mutations. However,
luckily, if mutations occur, there is only less than 1
percent probability that this mutation lies in a gene
region. But what if you are unlucky and a mutation
happens in a DNA region that encodes for an important
gene? There is a backup plan, called “programmed cell
death.” A cell simply commits “suicide” if it discovers
that something went wrong. So how does the cell know
that something went wrong during the copy process?
There is a group of genes called tumor-suppressor
genes, which monitor cell replication and will explode
it if it fails to pass the “checkpoint”—meaning there are
no mutations [ 2].
When a cancerous cell is formed:
˲ Mutations happen in the 1 percent gene regions of
˲ Tumor suppressor genes fail to kill the
˲ Another type of genes, called oncogenes, promote the
replication process surviving all attempts of killing it [ 3].
The result is a malfunctioned daughter cell that will
in turn replicate to form more malfunctioned cells.
What’s next is uncontrolled and abnormal cell division
that forms tumors, which we call cancer. Normally, a
single mutation is not enough for tumors to form. Cancer
is a result of the accumulation of a large number of
mutations over a long period of time, which makes the
cell resist all trials of being killed naturally.
A COMPU TATIONAL PROBLEM
Curing cancer is very difficult because every patient has
their own story. Mutations that happened in Patient X
and caused the cell to divide abnormally is not the same
as the mutations that happened in Patient Y. That’s why
there is no one treatment that cures all patients. Although we know some gene variations/mutations cause
cancer, there are so many other factors that actually
promote the disease.
Now, let us define our computational problem.
Given genomic datasets of cancer patients, can we
design experiments to do the following:
˲ Exactly define what mutations caused cancer for each
˲ Find correlations between the genomic features of
the patients and the type of cancer?
˲ Identify biomarkers in the DNA that lead to the
success or failure of a specific type of treatment?
˲ Get a better understanding on the development of the
disease over time?
˲ Customize treatment for cancer patients?
˲ Predict the outcome of clinical trials?
And the list of questions goes on. The challenge
is that genomic data is very heterogeneous. The data
comes in many different formats. In addition, the high
dimensionality of genomic data (thousands of features/
genes) and the relatively small number of samples available
make current statistical analysis and machine learning
Today, cancer research is more focused on
computational methods that can give us significant and
meaningful insights about the disease and how to treat
it. If you are someone with serious computational skills,
and you feel comfortable cleaning and analyzing tens of
gigabytes of data, the computational biology community
needs you. Data analytics, statistical methods, and
machine learning techniques are not monopolized by
the software applications community. Indeed, these
skills are now required for helping doctors make better
decisions for their patients.
If you feel motivated, join the cause and study CS
[ 1] Sawicki, M. P., Samara, G., Hur witz, M., and Passaro, E. Human genome project.
Am. J. Surg. 165, 2 (1993), 258–264.
[ 2] Klein, G. The approaching era of the tumor suppressor genes. Science 238, 4833
[ 3] Watson, J. V. Oncogenes, cancer and analytical cytology. Cytometry 7, 5 (1986),
I especially thank my advisor, Prof. Sheida Nabavi, for her continuous support.
I’d like also to thank my sister, Eman Hosny, for bridging my research in computer science
Abdelrahman Hosny is a computer science master’s student at the University
of Connecticut. He carries out research in the intersection between computer science
and biology. More specifically, he applies computational methods to genomic datasets
that aim at a better understanding of the cancer disease and its treatment.
Slow Internet speeds inhibit Beirut,
the capital of Lebanon, from being a hub
for startups in the region.