Applied to almost 3,500 articles it reveals
computing’s (and Communications’) culture,
identity, and evolution.
By DANieL s. soPeR AND ofiR tuReL
An n-Gram
Analysis of
GAINING A DeeP, tacit understanding of an institution’s
identity is a challenge, especially when the institution
has a long history, large geographic footprint, or
regular turnover among its employees or members.
Though the cultural themes and historical concepts
that have shaped an institution may be embedded in
its archived documents, the volume
of this material may be too much
for institutional decision makers to
grasp. But without a solid, detailed
understanding of the institution’s
identity, how can they expect to make
fully informed decisions on behalf of
the institution?
Many scientific disciplines suffer from this phenomenon, with the
key insights
N-gram analysis is a simple but
extremely useful method of extracting
knowledge about an institution’s culture
and identity from its archived historical
documents.
N-gram analysis can reveal surprising
and long-hidden trends that show how
an institution has evolved.
Knowledge gained from n-gram analyses
can substantially improve managerial
decision making.
problem especially pronounced in
computing.
1 A constant influx of new
technologies, buzzwords, and trends
produces an environment marked by
rapid change, with the resulting instability making it difficult to establish
a stable identity.
2, 3 As archived institutional artifacts, articles in journals
and other media reflect and chronicle
the field’s evolving identity. Unfortunately, humans are simply unable to
digest it all. However, by leveraging
a computational method known as
n-gram analysis, it may be possible
for computer scientists and scholars alike to unlock the secrets within
these vast collections and gain insight
that might otherwise be lost. If the
history and identity of computing are
encoded on the pages of journals, systematically analyzing them is likely to
yield a better understanding of where
the field has been and where it might