lated to cholesterol- and fat-related
terms than to kidney disorders, based
on cosine distances. Our analysis also
identified relationships to “sciatica”
and “cerebral ischemia” (“icd4359”)
with restricted blood flow in the brain.
The heat map in Figure 1 of cosine
distances between terms shows how
these terms relate to one another.
For example, “urged,” “sympt,” and
“assessment,” relate to patient inter-
actions, closely associated with diag-
nosed (“2722”) mixed hyperlipidemia
at the lower left of the Figure. Simi-
larly, several kidney-related terms and
codes are clustered at the upper right.
Corroboratively, the analysis tied “hy-
pertension” and “gastroesophageal
reflux disease (GERD)” with 91 unique
cases, two seemingly unrelated health
conditions that only as of March 2017
were found to be related. 17
The cosines we provide show “
cardiac” connecting most strongly to the
first obtuse marginal (OM1) and left
anterior descending (LAD) arteries,
as well as to the saphenous vein graft
(“SVG”) procedure. Perhaps because
it is an adjective, and not a condition,
“cardiac” is associated with terms pertaining to body parts and procedures
more than diagnoses. The heat map in
Figure 2 shows a noticeable cluster of
terms relating to catheterization and
stents (such as “stent,” “stenting,”
“xience,” and “instent”). Clustering
identifies measurements of cardiac
performance as well, with ventricular
activation time (“vat”) and left ventricular end-diastolic pressure (“lvedp”).
This emphasis, as identified through
clustering, is further reflected when
compared with the cosine distances
of “hypertension” (see footnote b).
The nearest terms in the heat map to
“cardiac” have greater cosine values,
indicating a smaller distance relative
to the neighbors of “hypertension.”
Such close association implies that
“cardiac” has a more focused meaning in our texts, whereas “
hypertension” is associated with a larger range
of disparate terms, in this case, frequently co-occurring conditions and
diagnoses. This may also be because
cardiac issues are often acute, with
specific actions rendered as treatments, while hypertension is more a
chronic disease, associated with many
related diseases.
Discussion
The analysis in our study dealt with a
relatively small PHI-cleansed sample
of medical-records data from a well-
defined context. We caution against
drawing medical conclusions from
such a sample, but the results are in-
tically close to other terms and diag-
nostic codes (such as “chronic airway
obstruction,” or “496”), kidney dis-
ease (“ckd,” nephropathy, 5852, and
5854), prostate issues (“prostatic” and
code “60000”), and others. The terms
and codes associated with hyperten-
sion in this population are more re-
Figure 3. Outline of the latent semantic analysis process.
Medical Record
Documents
MRD
3. Resulting in a cosine
distance matrix among
terms of interest
4. Analyzed by experts
2. Project terms of interest
on the semantic space
MRD
Terms of
Interest
MRD
Semantic Space
Cosine
Distance
Matrix
LSA
Figure 2. Clustering of the 40 terms and ICD-9-CM codes closest to “cardiac.”