spite the recent popularity of biometric authentication in consumer mobile
devices, multimodal biometrics have
had limited penetration in the mobile consumer market.
1, 15 This can be
attributed to the concern users could
find it inconvenient to record multiple
biometrics. Multimodal systems can
also be more difficult to design and
implement than unimodal systems.
However, as we explain, these
problems are solvable. Companies
like Apple and Samsung have invested significantly in integrating biometric sensors (such as cameras and
fingerprint readers) into their products. They can thus deploy multimodal biometrics without substantially
increasing their production costs.
In return, they profit from enhanced
device sales due to increased security
and robustness. In the following sections we discuss how to achieve such
Fusing Face and Voice Biometrics
To illustrate the benefits of multimodal biometrics in consumer mobile devices, we implemented Proteus based
on face and voice biometrics, choosing
these modalities because most mobile devices have cameras and microphones needed for capturing them.
Here, we provide an overview of face-and voice-recognition techniques,
followed by an exploration of the approaches we used to reconcile them.
Face and voice recognition. We used
the face-recognition technique known
as FisherFaces3 in Proteus, as it works
well in situations where images are
captured under varying conditions, as
noisy voice recording can lead a bio-
metric algorithm to incorrectly iden-
tify an impostor as a legitimate user,
or “false acceptance.” Likewise, it can
cause the algorithm to declare a legit-
imate user an impostor, or “false re-
jection.” Capturing high-quality sam-
ples in mobile devices is especially
difficult for two main reasons. Mobile
users capture biometric samples in a
variety of environmental conditions;
factors influencing these conditions
include insufficient lighting, differ-
ent poses, varying camera angles, and
background noise. And biometric
sensors in consumer mobile devices
often trade sample quality for por-
tability and lower cost; for example,
the dimensions of an Apple iPhone’s
TouchID fingerprint scanner prohibit
it from capturing the entire finger,
making it easier to circumvent.
Another challenge is training the
biometric system to recognize the
device user. The training process is
based on extracting discriminative
features from a set of user-supplied
biometric samples. Increasing the
number and variability of training
samples increases identification accuracy. In practice, however, most
consumers likely train their systems
with few samples of limited variability for reasons of convenience. Multimodal biometrics is the key to addressing these challenges.
Promise of Multimodal Biometrics
Due to the presence of multiple pieces
of highly independent identifying in-
formation (such as face and voice),
multimodal systems can address the
security and robustness challenges
confronting today’s mobile unimodal
systems13, 18 that identify people based
on a single biometric characteristic.
Moreover, deploying multimodal bio-
metrics on existing mobile devices is
practical; many of them already sup-
port face, voice, and fingerprint recog-
nition. What is needed is a robust us-
er-friendly approach for consolidating
these technologies. Multimodal bio-
metrics in consumer mobile devices
deliver multiple benefits.
Increased mobile security.
Attackers can defeat unimodal biometric
systems by spoofing a single biometric modality used by the system. Establishing identity based on multiple
modalities challenges attackers to
simultaneously spoof multiple independent human traits—a significantly
More robust mobile authentication.
When using multiple biometrics, one
biometric modality can be used to
compensate for variations and quality
deficiencies in the others; for example,
Proteus assesses face-image and voice-recording quality and lets the highest-quality sample have greater impact on
the identification decision.
Likewise, multimodal biometrics
can simplify the device-training process. Rather than provide many training
samples from one modality (as they
often must do in unimodal systems),
users can provide fewer samples from
multiple modalities. This identifying
information can be consolidated to
ensure sufficient training data for reliable identification.
A market ripe with opportunities. De-
Figure 1. Schematic diagram illustrating the Proteus quality-based score-level fusion scheme.
Face Image Face Quality
If (S1 w1 + S2 w2 ≤ T)
Decision = grant
else Decision = deny