has achieved positive results: improving users’ task performance, 29
establishing trust and likeability in a real
estate transaction context, 3 improving
naturalness of interactions with appropriate emotions, 29 and in advancing
tutorial systems. 11, 39 This can largely be
attributed to the findings that humans
respond to these systems socially, even
when they are not. Adding emotional
intelligence should only enhance this
natural, social response, but more research is needed.
The issue of “social caretaking” 6 (that
is, using emotional agents to care for the
young, infirmed, or elderly) is a new field
under investigation. It has been found
that proactive, affective agents can help
elderly users feel more comfortable with
the technology, and can even ease loneliness to some degree. 31 Also, work by
Lucas et al. 20 shows the real promise in
using conversational agents in clinical
interviews. They obtained more honest
responses from patients with increased
willingness to disclose, since the patients felt more comfortable talking
with an agent than a human in certain
circumstances. While researchers in
this line of work have shown the benefits of agents, they have also pointed out
that humans will engage in racism, lie,
feel envy, and more toward emotional
agents. Thus, this is a key area to continue exploring, as we get better at designing emotional systems.
However, there have been concerns
raised that the appearance of these embodied emotional agents lack naturalness, especially with nonverbal gestures
and cues such as inaccurate eye gaze or
emotional facial gestures. 2 If humans
begin to mimic or model their interactions with an agent who doesn’t emote
appropriately, it could result in negative
emotional learning. This issue is of most
concern in the social caretaking scenarios mentioned above, and especially with
children, who model behavior through
social learning. 1 While the affective modeling community is making great strides
in creating more natural, human-like
embodied agents with real, human-like
communication patterns, 39 we have a lot
to do to allay these concerns.
Robots. Physical systems have ad-
vantages over virtual agents. The most
obvious is that robotic systems can per-
form physical actions and tasks in the
real world. They can put an arm around
bots is now much lower, as illustrated
by a 14-year-old boy who created his
own homework reminder bot.c Many
emotional cues are nonverbal, and
therefore require an agent to have the
ability to express nonverbal emotion.
More recent dialogue systems, such
as Xiaoicedd), leverage text-to-speech
technology, allows for a greater range
of expression through voice prosody.
Yet, the effective synthesis of nonver-
bal cues is still a very challenging prob-
lem. Currently, realistic synthesis of
voice tone requires thousands of lines
of dialogue to be recorded. Generative
machine learning methods may even-
tually help replace the need for this
type of labor-intensive data collection
and provide realistic voice synthesis.
Virtual agents. While most present
day virtual, conversational personal assistants do not rely on emotional recognition or delivery (for example, Siri, Cortana, and others), there has been a large
literature examining personality and
other emotional components of conversational agents, as well as the social and
personal benefits that accrue from their
use. Starting with the work by Reeves
and Nass30 in their landmark book, The
Media Equation, a communication theory was laid out that suggested humans
treated computers and other forms of
media as socially as they would another
human during conversation. They also
claimed that this response from humans is automatic (that is, without conscious effort). Reeves and Nass argued
that people respond to what is present
in new forms of media, and their
perception of reality, as opposed to what
they know to be true (for example, this
is a computer). This allows users to be
able to assign a personality to a conversational agent, among other things.
Through a series of studies, Reeves,
Nass and their colleagues showed that
politeness, personality, emotion, social roles, and form all influence how
humans treat and respond to all kinds
of media, including computer systems.
Researchers in the tutoring community11 have shown that emotionally sentient systems enhance the effectiveness
of human-computer interaction, and
that the lack of emotional responsive-
c http://www.christopherbot.co/
d https://thestack.com/world/2016/02/05/
microsoft-xiaoice-turing-test-china/
ness can reduce performance. Krae-
mer19 has provided ample evidence of
the benefits of socio-emotional benefits
of pedagogical conversational agents.
A further line of research emphasizes
that embodied agents offer several ad-
vantages over non-embodied dialogue
systems. An agent that has a physical
presence means that the user can look
at it. Cassell8 has written a lot about this,
including how the representation of the
agent and its modalities have greater
benefits than the early dreams of ubiq-
uitous computing36 and its goal of em-
bedded (invisible) interaction. Central
to her argument is that it is important to
realize how humans interact with each
other. The human body allows us to “lo-
cate” intelligence—both the typical do-
main knowledge required, but also the
social and interactional information we
need about conversational parameters
such as turn-taking, taking the floor,
interruptions, and more. In this vision,
then, an embodied social agent who con-
verses with the user requires less naviga-
tion and searching than traditional user
interfaces (because you know where to
find information). Multimodal gestures,
such as deixis, eye gaze, speech patterns,
and head nods and other, nonverbal
gestures are external manifestations of
social intelligence which support trust-
worthiness. 3 For instance, early research
has shown that to attain conversation
clarify, people rely more on gestural cues
when their conversations are noisy. 32
From this perspective, embodied social
agents might be a more natural way for
people to interact with computation.
So, conversational agents provide a
mental model for the user to start with.
Well-designed or anthropomorphic fea-
tures can then help to create a frame-
work of understanding around how to
work with these agents. Specifically,
conversational agents can provide af-
fordances for available interaction qual-
ities, capabilities, and limits. Our argu-
ment is that if designers can tap into
users’ natural affinity for social interac-
tion with an agent, this will also lead to
higher levels of affinity for, and interac-
tion with, that agent. This will eventu-
ally lead to trust. If we design agents to
not only behave as we expect them to,
but also to adhere to social norms and
values, then we can amplify trust. 12
Today, research focusing on virtual
assistants, both embodied and not,