limbs, with initial focus on learning
to reach and grasp. BabyX version 4 is
intended to interact with the public in
exhibitions, performing basic learning tasks (such as label learning). For
speech, BabyX babbles with a synthesized voice sampled from phonemes
produced by a real child. We are implementing techniques so BabyX can
learn an acoustic mapping from any
arbitrary voice to construct new words
using her own voice. Lip shapes are
pre-associated with acoustic elements.
Our aim for BabyX is that she should
be capable of learning arbitrary sensorimotor sequences, theorized to map
to sentence construction.
17
In an ongoing developmental psychology study, we are conducting a detailed quantitative characterization of
the microdynamics of early social learning between parents and their infants.
As a first step to validate the effectiveness of BabyX’s behavior at a high level,
we will be exploring how well the model
elicits naturalistic responses from parents in a social interaction loop, compared to their own or another child. If
the model is successful, we will have
a new way to study coordinated interaction, and how the way in which we
teach infants may play a critical role
in learning. Introducing synthetic
lesions could be an effective way to explore lower-level validation.
The Auckland Face Simulator
A developing infant is certainly not the
easiest approach for creating an embodied conversational agent for HCI tasks.
For this purpose, we are building on the
same underlying computational platform the “Auckland Face Simulator” (see
Figure 8 and Figure 9, as well as Figure
4) also demonstrated at SIGGRAPH31 to
produce highly realistic avatars capable
of real-time interaction; for a related video, see https://vimeo.com/128835008.
These faces are designed to be used as
stimuli for psychological research but
also to provide a realistic interface for
third-party virtual-agent and AI applications. The avatars can be “told what to
say” using text to speech (TTS), and the
nonverbal behavior can be specified
in a simple API or custom TTS markup
language to add further meaning. Wrinkling the nose or raising the upper lip
while speaking can dramatically change
the perceived meaning. BL allows internal variables of the avatar’s nervous system to be controlled at any level, from
muscles to affective circuits.
Conclusion
Engaging face to face with an interactive
computer model requires autonomy
with contextual responsiveness. If visu-
ally consistent, realistic appearance and
movement seem to increase the sensory
intensity of the experience. Internally
consistent generative models enable
cognitive, affective, and physiological
become a part of the holistic environ-
ment and the interactive experience.
There may be ethical implications as
well, and further research is needed
to investigate the co-defined dynamic
interaction that allows such strong “in
the moment” emotional responses, as
such responses may have long-term in-
terface implications.
Further development and validation.
We are currently working on BabyX
version 4, as in Figure 4, which has a
virtual body and is able to control her
Figure 8. The Auckland Face Simulator is being developed to create realistic and precisely
controllable real-time models of the human face and its expressive dynamics for psychology
research and real-time HCI applications.
Figure 9. The Auckland Face Simulator enables autonomously animated faces to be used for
cinematic-like extreme close-up shots.