pergraph or meta-network. This hypergraph model of folklore offers the
potential for productively spanning
the distant-reading and close-reading
spectrum explored earlier.
Connections
Connecting texts to one another is
one of the most complex tasks of the
hypergraph-generation process. Net-
works describing the connection of
stories to their tellers and stories to
the places they mention are relatively
easy to generate since they are mainly
an alternative representation of the
corresponding databases. More com-
plex is generating <story to story>
networks based on multi-objective cri-
teria. As a case in point, one criterion
of our work is to not abandon earlier
systems of classification (such as the
tale type, motif indices, and collection-
specific indices) but incorporate the
information from them into the story
representation. More generally, the
chosen story representation should of-
fer mechanisms for incorporating ac-
cumulated scholarly knowledge. Con-
sequently, our approach is to model
each story as an attribute-valued vec-
tor, where the values of some attributes
are known a priori and others are com-
puted. Some attributes may take as val-
ues either simple scalars or “links” to
more complex structures or processes.
Other attributes may have associated
with them time-varying functions re-
cording a sequence of attribute values
over time, or a time series. We did not
incorporate time-varying attributes in
Figure 3. overview of the hypergraph browser for computational folkloristics. the target
ghost story (story id 340) is upper left, identified with a landmark. the story of the house elf
(nisse) is right of center, also identified with a landmark (story id 348). the upper navigation
screen shows a tree view of the organized hypergraph; the left panes provide navigation and
selection; the lower right includes a graph-navigation toggle. this hypergraph browser is
based on abello et al. 2
Figure 4. overall story space for the test corpus, with a cluster selected for further inquiry.
our initial limited experimental dataset
of stories; Table 1 lists the attributes
for stories in the corpus, and Table 2
lists the attributes of a single story.
The dataset includes 342 storytellers and 942 stories, along with Tang
Kristensen’s topic index for two collections—Danske sagn33 and Danske
sagn, ny raekke32—as the basis for the
network. We supplemented this information with two additional weighted
graphs: The first was based on a simple
“shared keyword” weight computed
for each pair of stories. Using AutoMap
we generated a set of 1,201 keywords
shared among all stories across the
corpus, after eliminating pronouns,
prepositions, articles, and conjunctions. 7 This bottom-up approach gave
us an important view of the corpus
based on shared vocabulary across
texts. The second was a top-down approach that categorized stories according to a shallow ontology for the corpus
developed specifically for the realm of
Danish folklore. 35 Such shallow ontologies depend on domain expertise,
providing a view of the corpus attuned
to the “tradition dominants” of a particular tradition group. 15, 21, 26 The use
of natural language processing (NLP)
tools in concert with DanNet, the Danish version of WordNet, may allow us
to develop further the first-generation
shallow ontology for the Danish folklore corpus. 27, 29 Topic-modeling methods (such as Latent Dirichlet Allocation and Latent Semantic Analysis)
could be used as additional methods
for generating a <story to story>
graph. 6, 11, 14 The advantage of the hypergraph model, or multimodal network,
is that new network representations
can be added to the overarching model as their applicability to the study of
folklore is further understood.
Our hope is that the end result is a
system with applicability for researchers working with a range of folklore
and ethnographic collections and ultimately for any collection or series
of collections of culturally expressive
forms. In our own earlier work, we explored development of a shallow ontology for the Danish folklore collection
as a part of a method for revealing narrative tendencies in the corpus based
on four attributes of the storytellers:
gender, occupation/class, age when
the stories were told, and education. 35