another story, and iterate the process
(see Figure 6). The system keeps the
user-selected landmarks as aggregated
counts that can be used by authorized
story “curators” to update the existing
hypergraph for future analysis. This incorporation of scholarly feedback is essential for the system’s performance,
as it allows for the aggregation of accumulated expert knowledge, which
in turn helps guide researchers as they
navigate the hypergraph.
The graph-theoretic techniques we
applied to generate the final hierarchical graph map involve four major tasks:
˲Define similarity measures in
stories;
˲ Construct a weighted graph among
stories, where the weight of the connection between two stories is obtained
through order-preserving transformations of their similarity measures;
˲ Decompose the graph(s); and
˲ Compute a “hierarchy tree” for the
obtained graph(s).
The similarities measures we used
are based on several algorithms, including: Hellinger distance, 23 weighted
Jaccard coefficient, 19 cosine similarity, 18 and scholar-defined weights. The
measures are transformed through
Gaussian kernels and other standard
kernels20, 30 that assign to every pair of
stories that share at least a minimum
set of attributes a corresponding similarity weight. In general, the graphs obtained through these transformations
vary depending on the type of similarity measure used. Consequently,
we obtain a final weight between two
stories by computing a norm of their
similarity-weights vector. Because the
obtained graph has different density
“regions,” we follow the approach described in earlier work by Abello et al., 2
decomposing the collection of stories
into maximal sub-graphs according to
their inherent k-connectivity. Each obtained sub-graph is then clustered using an adapted version of Markov Clustering. 2 The entire process is encoded
in a data structure called a “hierarchy
tree” visually represented in the user
interface as a hierarchical graph map
(see Figure 3), 2 constituting one of the
central modes of interaction between
user and story corpus. Worth noting is
that each cluster includes a unique associated label string that encodes its
“semantic” placement in the hierarchy
the trouble
with house elves
begins with
their unpredictable
nature, which,
for folklorists,
makes them
difficult to track
through the
landscape of
the story corpus.
tree. These cluster labels can be used
to help provide and receive feedback
from scholars studying the corpus, as
well as judge the effectiveness of the
suggested classification.
Performance on the
two Folklore tasks
The first task we included in the system was to place the target story in a
neighborhood of closely related ghost
stories. The system succeeded admirably, moving the story from a group of
stories about manor lords in the original classification scheme to a neighborhood of stories about ghosts and
other supernatural beings that haunt
workers in farm buildings and barns—
a new implicit category that did not
exist in Tang Kristensen’s original indices. Although we left success criteria
for this work fuzzy, we placed the target
story in the same part of the organized
meta-graph as several ghost stories
exhibiting certain topical similarities; for example, another story where
a haunt makes it impossible to use
part of the farm appears nearby in the
hypergraph: “A woman died in a farm
in Dokkedal… and she showed herself
every noon at a specific place in one of
the rooms and always as a black shape.
Because of that, the room stood empty
for a long time, since nobody dared go
in there...” 33
Perhaps more interesting is the system’s ability to meet the second task—
suggest candidate stories, given the
researcher’s interest in a target story.
In this case, the visual representation
of the meta-network reveals a close affiliation of the target story with a story
not classified by Tang Kristensen as a
story about ghosts but as a story about
house elves (nisse), a category of supernatural beings not usually considered
thematically related to ghosts: “When
they got home, the farmhand was
happy because now he’d gotten something to use for feed, and afterward
nis [a house elf] could go and feed the
animals just as much as he wanted to.
Then they got another farmhand, and
he didn’t want to let him go on like
that. But he got lifted up in his bed
and all the way up to the rafters, so he
lay there dead when people got up the
next morning.” 33
The intersection between the target
story and this latter story, likewise not