attributes
notes
Description
term frequency
of keywords
A term frequency
count of keywords
selected from
the vocabulary
of the entire corpus;
keywords can
be bigrams.
Place names
A geo-referenced
representation of
places mentioned
in a story and the
place where a story
was collected.
Personal names
An index of the people
mentioned in a story
and the storytellers.
Theory
Recent folklore theory
emphasizes the role of
storytellers in shaping tradition
as part of their own creative
and ideological expression. 31
People mentioned in stories
help situate the stories in
the local social environment.
Computed?
shallow-ontology
evald tang
Kristensen index
A shallow, hierarchical ontological
representation of each text
in the corpus. The nine top
categories—Actions or events;
Animals; People; Places;
Resolution; Stylistics; Supernatural Beings; Time, Season,
Weather; Tools, items, and
Conveyances—along with 125
second-level categories, offer
a hierarchical overview of the
content elements of the story.
A topic index to each
of Tang Kristensen’s
published collections.
The underlying theory
of this approach is
known as “bag of words,”
where the vocabulary of
a given text is recognized
as meaning bearing.
The underlying theory of this
approach is predicated on
vladimir Propp’s Morphology of
the Folktale28 and, later, Alan
Dundes’s refinement of that
work. 12 Dundes proposed that
individual elements of a story
(allomotifs) often fulfill a structural role in the narrative
(motifemes). The shallow
ontology first developed in
Tangherlini34 catalogs the stories’
content on multiple levels, allowing comparison across stories
even when the vocabularies of
the stories are divergent.
nearly all published
folklore collections include
their own topic indices
based on the collector’s
or editor’s evaluation of
“what the story is about.”
in Tang Kristensen’s
publications, he separated
the collections
first according to
genre, then according
to “top level” topics
(such as “Witches
and their Games”),
then into more specific
categories (such as
“Driving with three
wheels”).
Folklore theory
has always been
concerned with
the relationship
between traditional
expression and
the physical
environment. 22
inclusion of
geo-referenced place
names here allows
one to explore the
geographic location
of stories in regard
to attributes related
to content.
no.
Benefits
yes. Keywords were
identified using AutoMap.
The vocabulary of each
individual story was
subjected to a stop word
filter (articles, pronouns,
prepositions, conjunctions)
and rudimentary stemming (Snowball stemmer).
A term-frequency count
for each individual story
was computed based on
this limited vocabulary.
Partially. The identification and alignment
of place names with
historical place name
gazetteers is a semi-supervised operation;
the de-duplication
of place names in
the corpus was
assisted by DDupe. 4
Partially. named-entity
detection implemented in
Mallet was used to generate
a list of potential personal
names from the corpus. 24
This list was aligned with
the partial list of personal
names in one of Tang
Kristensen’s indices.
Challenges
and
drawbacks
Although Danish is not
highly inflected, there is
a need to at the very
least stem the
vocabulary. umlaut
and limited cases of
syncope introduce
inaccuracies into the
stemming. Without
stemming, inflected
forms of words are
counted as discrete
lemmata. Bag-of-words
approaches discard all
important grammatical
information that contrib-
utes to meaning making.
The emphasis in folklore
theory on the role of the
individual in the creation of
tradition and the highly localized
and historicized nature of much
folkloric expression can be
captured by this index.
Many of the places
mentioned in stories
are difficult to resolve
to existing place-name gazetteers.
Some place names
have changed over
the years, some have
ceased to exist, and
others are only known
in local culture.
Small orthographic
variations in
place-name spelling
also contribute to
some uncertainty.
Danish names are complex.
A century of at times contradictory laws have led to a situation
where many people have a last
name based on a patronymic
ending in -sen, with a limited
number (approximately 116)
possible first components (such
as Pedersen from Peder’s son)
and a second last name derived
entirely from place names.
The former phenomenon makes
it difficult to align people
mentioned with external archival
resources (such as a census),
while the latter makes it difficult
to easily discern between place
names and personal names.