5. 1, 3. 8; p = 0.0233, 0.0499). The absence
of ecological interpretation by the baseline system was thus judged adversely for
all movement patterns, particularly so
when the birds were relatively static. We
also found the full blogs were rated as
more informative (p<0.0001) and more
engaging (p=0.0215) but not more fluent
(p=0.825) (see Figure 8), confirming hypothesis H5.
The two studies we have presented
here demonstrate that computer-generated blogs are appraised more positively than human-written blogs, and
that computer-generated blogs with
creatively generated ecological insights are preferred overwhelmingly to
blogs generated from the same data
but without inclusion of these insights.
Conclusion
The Blogging Birds system shows that
raw satellite tag data can be transformed
into fluent, engaging, and informative
texts directed at members of the public
and in support of nature conservation.
We demonstrated that computers
can compete with human experts in
generating creative stories from numerical data. Unlike natural language generation systems that generate texts for news reporting or for
decision making in the workplace,
Blogging Birds’s narratives are not
entirely factual. Though the system is
constrained by the observed data and
its ecological domain model, the red
kites’ reported foraging and social
behaviors are only imagined to have
taken place. However, including
these behaviors in the narratives allows us to communicate red kite ecology to the reader, and the blogs are
better appraised as a consequence.
Our work thus simultaneously addresses the societal challenges of
communicating data effectively and
engaging the general public with scientific research.
Blogging Birds composes blogs by
combining texts produced through
three different types of analysis: The
first is a generic factual summariza-
tion of telemetric data enriched with
location-specific information about
weather conditions, habitat type, and
geographic features, and can be read-
ily adapted for use in other domains.
The second is the processing and eco-
logical interpretation of movement
for “informativeness,” thus illustrating
the difficulty of being informative, en-
gaging, and fluent at the same time,
even for a human writer. Indeed, all the
writers were committed and used the
full 1. 5 hours for composing the blogs,
yet most were outperformed by the com-
puter on each of the three metrics. For
examples of human-written and com-
puter-generated blogs, as well as details
of how they were appraised by evalua-
tors, see the online appendix.
A questionnaire filled out by the blog
writers provided many interesting insights. In general, they found it difficult
to comprehend and summarize the
sheer amount of data in fewer than 200
words but also felt the process became
easier the more they did. There was,
however, concern from many that the
blogs were becoming repetitive, especially if there was little variation in what
the red kites were actually doing, stemming largely from a lack of knowledge
of kite ecology and behavior. Summarizing the range of data in different formats was certainly challenging, and
some enjoyed the process more than
others. There was considerable variability in how the blog writers used the materials provided them to create the
blogs. Some concentrated mostly on the
visible patterns on Google maps, others
looked in more detail at the map data by
clicking on individual map points to
find out more, and yet others found inspecting the data in a tabular format
was most useful. Asked whether they
would like to write the red kite blogs as a
job, the consensus was that, although
initially enjoyable, it would quickly get
tedious and increasingly more difficult
to write non-repetitive material.
Evaluation against baseline.
Participants demonstrated a conclusive preference for the full system with ecological
insights, preferring it in 61 trials compared to only 20 trials in which the baseline was preferred (χ2 = 21. 5; p < 0.001),
confirming hypothesis H4. Interestingly,
this effect was strongest when blogs described situations with little movement
by the birds during those weeks (C1);
here, the full-system blogs were preferred in 23 trials compared to just four
baseline blogs (χ2 = 13. 4; p=0.0002). For
C2 and C3, the corresponding values
were preferences for the full system in 20
and 18 trials, compared to preferences
for the baseline in eight trials each (χ2 =
Telemetric data
is ubiquitous,
captured by
smartphones
and other mobile
devices, as well
as through GPS
sensors embedded
in vehicles used by
the transportation
industry and others.