tient privacy and who should be able to
contribute to health records, bioinfor-maticists are researching how to perfect NLP capabilities.
One NLP issue, says Denny, is that
the majority of natural language processing technology has been created
for dictated speech, but electronic
health records are often typed.
“Typed documents are more ambig-
uous because there are more abbrevia-
tions and acronyms, and the documents
usually contain more misspellings,”
says Denny. “You don’t describe things
in as much detail, so that hinders a lit-
tle bit the richness of what can be done
with natural language processing. May-
be speech-recognition technology will
move forward fast enough that we’ll
move toward documents that look like
dictated documents, and the problems
won’t be as big.”
William Cohen, research professor
of machine learning at Carnegie Mel-
lon University, is currently research-
ing the creation of a domain-specific
version of the Never Ending Language
Learner (NELL), refining its general-
purpose Web-crawling algorithm for
use in published biomedical litera-
ture called BioNELL. While BioNELL
is focused on published biomedical
literature rather than on-the-fly clini-
cal notes, Cohen says such domain-
specific, rank-and-learn principles
might be useful for organizations’ in-
house lexicons while wider standards
are being crafted.
“The exciting thing I find about
one nLP issue, says
Joshua C. Denny,
is that the majority
of nLP technology
has been created
for dictated speech,
but electronic
health records
are often typed.
BioNELL is taking the existing struc-
tured databases and using them to
kick-start an NLP system,” Cohen
says. “Symmetrically, you might want
to take a natural language corpus and
use that to kick-start understanding a
sensor database.”
Brunak says that while the attention
paid to his group’s Computational Biol-
ogy paper about unlikely comorbidities
was interesting, the broader message
should be that examining more hith-
erto machine-unreadable data could
alter the practice of medicine.
“Real patients have more than one
disease,” says Brunak, “and the patient records give us an opportunity
to discover comorbidities and disease
correlations—not only those that co-
news
occur but also disease trajectories, that
is, those that come before others. The
message should be that we can start
disease profiles of real patients instead
of doing what medicine has done for
hundreds of years, studying people disease by disease.”
Further Reading
Barrett, N. and Weber-Jahnke, J.H.
Applying natural language processing
toolkits to electronic health records-an experience report, Studies in Health
Technology and Informatics, 143, 2009.
Denny, J.C., et al.
Phe WAS: demonstrating the feasibility of
a phenome-wide scan to discover gene-disease associations, Bioinformatics 26, 9,
May 1, 2010.
Murff, H.J., et al.
Automated i dentification of postoperative
complications within an electronic medical
record using natural language processing,
Journal of the American Medical
Association 306, 8, August 24, 2011.
Rosenbloom S. T., Stead W. W., Giuse, D.,
Lorenzi N.M., Brown S.H., and Johnson K.B.
Generating clinical notes for electronic
health record systems, Applied Clinical
Informatics 1, 3, Jan. 1, 2010.
Savova, G.K., et al.
Discovering peripheral arterial disease
cases from radiology notes using natural
language processing, Proceedings of the
Annual symposium of the American Medical
Informatics Association, Washington, D.C.,
nov. 13–17, 2010.
Gregory Goth is an oakville, Ct-based writer who
specializes in science and technology.
© 2012 aCM 0001-0782/12/06 $10.00
Milestones
Computer Science Awards
the american academy of
arts and sciences, John simon
Guggenheim Memorial
foundation, national science
foundation, and the franklin
Institute recently honored leading
computer scientists.
AMeRiCAn ACADeMy MeMBeRs
the american academy of arts
and sciences named seven
new members in the section of
computer sciences (including
artificial intelligence and
information technologies).
they are Arvind, Massachusetts
Institute of technology; Robert
P. Colwell, r&e Colwell &
associates, Inc.; Irene Greif, IBM;
M. Frans Kaashoek, Massachusetts
Institute of technology;
Michael Kearns, University
of pennsylvania; Judea Pearl,
University of California, los
angeles; and Jeffrey D. Ullman,
stanford University.
GuGGenheiM feLLo W
John simon Guggenheim
Memorial foundation selected
Susan Landau, a visiting scholar
at harvard University, as a
2012 fellow in the category of
computer science.
ALAn T. WATeRMAn AWARD
the national science
foundation (nsf) named Robert
Wood, an associate professor
in harvard University’s school
of engineering and applied
sciences, and Scott Aaronson,
an associate professor of
electrical engineering and
computer science at the
Massachusetts Institute of
technology, to receive the 2012
alan t. Waterman award, which
“recognizes an outstanding
researcher under the age of
35 in any field of science or
engineering nsf supports.”
BenJAMin fRAnkLin MeDAL
the franklin Institute awarded
the Benjamin franklin Medal
in Computer and Cognitive
science to Vladimir Vapnik,
who is a professor of computer
science and statistics at royal
holloway, University of london,
and holds a professorship in
computer science at Columbia
University. vapnik was recognized
for “his fundamental contribu-
tions to our understanding of
machine learning” and “his
invention of widely used
machine learning techniques.”
—Jack Rosenberger