Nnews
Science | DOI: 10.1145/2184319.2184324
Gregory Goth
Analyzing Medical Data
Electronic patient records contain a treasure trove of data,
and researchers are using natural language processing technology
to mine the structured data and free text.
A patient cohort network in søren Brunak et al.’s Compuational Biology paper: nodes represent
patients, edges are correlations between patients, and node color denotes cluster membership.
One of the technological ironies in health care is the disconnect between the advanced state of clinical technology, such as nonconfining open imaging technologies,
the variety of smartphone health apps,
surgical robots, and the backward state
of electronic patient records.
Courtesy oF tHe Center For bIoloGICal sequenCe analysIs, teCHnICal unIVersIty oF denMark.
Until the passage of the Health Information Technology for Economic
and Clinical Health Act in the U. S. three
years ago, only an estimated 20% of
U.S.-based physicians used electronic
patient records. That percentage is rapidly increasing due to the law’s financial incentives, but the new attention is
also awakening researchers to the limitations of the structured data that is often exchanged between physicians, insurance companies, and organizations
interested in compiling and repurposing discrete patient records to conduct
population-based medical research.
Søren Brunak, director of the Center for Biological Sequence Analysis at
the Technical University of Denmark,
is one of these researchers who is using natural language processing (NLP)
technology to mine not only structured
data such as standardized disease
codes, but also free text.
In a recent paper, “Using Electron-
ic Patient Records to Discover Disease
Correlations and Stratify Patient Co-
horts,” published in Computational
Biology, Brunak and his colleagues
showed that combining free-text anal-
ysis with structured disease definition
codes can help researchers discover
unexpected connections between
diseases, such as a link between mi-
graine headaches and alopecia (hair
loss), but also the lack of expected
comorbidities, such as those between
diseases coded as “mental and be-
havioral disorders” and those coded
under the “drug abuse, liver disease,
HIV” clusters.