As with facial coding, there is a strong
focus on designing systems that work
outside of lab-based settings. Numerous companies have related software
development kits (SDK) and application programming interfaces (API) (for
example, BeyondVerbal, audEERING,
Affectiva) that provide prosodic feature
extraction and affect prediction. As
with facial expressions, there is likely to
be some level of universality in the perception of emotion in speech (a similar set of “basic” emotions) but a great
amount of variability will exist across
languages and cultures. Many of these
“non-basic” states will be of greater relevance in everyday interactions.
Physiology and brain imaging. While
expressed affective signals are those
that are most used in social interactions, physiology plays a significant
role in emotional responses. Innervation of the autonomic nervous system
has an impact on numerous organs in
the body. Computer systems can measure many of these signals in a way
that an unaided human could not.
Brain activity (for example, electroencephalography (EEG), functional near
infra-red (fNIR)), cardiopulmonary
parameters (for example, heart and
respiration rates and variability) and
skin conductance all can be used for
measuring aspects of nervous system
activity. Although wearable devices
have only had partial adoption, there
are several compelling approaches
for measuring cardiovascular (heart)
and pulmonary (breathing) signals
using more ubiquitous hardware. The
accelerometers and gyroscopes on a
cellphone can be used to detect pulse
and breathing signals, and almost any
webcam is sufficient to remotely measure the same. While people are experienced at applying social controls to
their facial expressions and voice tone,
they do not have the same control
over physiological responses, meaning measurements may feel more intimidating and intrusive to them. One
should be cognizant of these concerns
in the design of agents, as they are
likely to influence how the agents are
perceived, from how likable they are,
to how trustworthy they are.
Design challenges for adoption.
Despite the advances in sensing emo-
tions, there remain many challenges in
basic objective measurement. Many of
text and speech sentiment analysis,
and they are simple to apply. One can
design a system that analyzes speech or
text for verbal sentiment with a speech-
to-text engine. Designers should be
aware that these systems might not
capture the full complexity of human
language. Though many of these sys-
tems are trained on large-scale corpora
that are available to researchers (for
example, Tweets), they may not always
generalize well to other domains (like
email messages).
Nonverbal. Facial expressions, body
gestures, and posture are some of the
richest sources of affective information. We use automated facial action
coding and expression recognition systems to measure these signals in videos.
Automated facial action coding can be
performed using highly scalable frame-
works, 23 allowing analysis of extremely
large datasets (for example, millions
of individuals). These analyses have re-
vealed observational evidence of cross-
cultural and gender differences in emo-
tional expression23 that for the first time
can actually be quantified. Depth-sens-
ing devices like the Kinect sensor signif-
icantly advanced pose, gesture, posture
and gait analysis making it possible to
design systems that used off-the-shelf
low-cost hardware. Designers now have
access to software SDKs for automated
facial and gesture coding that are rela-
tively simple to integrate into other ap-
plications. These can even be run on
resource-constrained devices enabling
mobile applications of facial expression
analysis, such as mobile agents that re-
spond to visual cues.
Acknowledging expressions of confusion or frustration from a user’s face
is one practical way that an agent could
make use of facial cues to the benefit of
the interaction. Within a known context
(that is, an information-seeking task) it
is possible to detect these types of negative expressions when they occur. Generally, responses to incorrectly detected
affective states will not frustrate the
user if they are able to understand the
reasoning that the agent used. 24
The use of a camera or microphone
for measuring affective signals (whether
in public or private settings) is a particularly sensitive topic, especially if subjects
are not aware the sensors are present
and active. Designers need to carefully
consider how their applications may ultimately influence social norms about
where and when video and audio analysis and recording is acceptable.
Speech prosody. With the rise of
conversational interfaces (such as
Cortana and Siri) nonverbal speech
signals present an increasingly valuable source of affective information.