I
M
A
G
E
B
Y
T
H
E
A
U
T
H
O
R
S
al. 4 and Holzinger et al. 8 for discus-
sions of other text-analysis methods.
The study demonstrates how LSA
can be applied to sanitized medical
records dealing with congestive heart
failure to identify patterns of associa-
tion among terms of interest, as well as
among those terms and the ICD codes
that appear for the same patient. By
“sanitized,” researchers mean docu-
ments after removing protected health
information (PHI) content. The data
was provided by Independence Blue
Cross (IBX), a health insurer, and deals
with congestive heart failure.
Some associations revealed through
LSA in this study were expected (such
as the one between hypertension and
obesity). Associations might be obvi-
ous, but identifying them is essential
because it shows the credibility of the
method. Other associations were less
expected by medical experts (such as
the infrequent association LSA identi-
fied between “hypertension” and “sci-
atica”). That association might indicate
a one-person issue, highlighting the
potential for identifying associations
among medical terms through LSA that
might reveal cases that require special
attention or unexpected, possibly pre-
viously unknown, relationships. As we
explain in the next section, which de-
scribes LSA, associations identified by
LSA might also include terms that do
not appear together in any document
but rather are associated with one an-
other through their joint association to
another term.
Past research that applied LSA to
medical science showed LSA can iden-
tify shared ontologies across scientific
papers, even if terms have different
names, 15 and the degree that concepts
are shared across papers in the Proceed-
ings of the National Academy of the Sciences
can reveal expected patterns. 13 The study
adds a new angle to the accumulating
literature on LSA in medical contexts by
showing its potential contribution to
medical science by associating medi-
cal terms and ICD codes as applied in
practice in medical reports, especially
by adding an ordinal scale of how close
the terms are to one another com-
pared to other terms. For example, the
cosines we useb suggest that in this
population hypertension is closer to
being benign than chronic and even
b https://github.com/jakemiller3/GefenEt-
CACM-MedicalLSA/blob/master/Gefen% 20
et%20al%20Online%20Appendix.pdf
Word cloud of data.