organizing and cross-functional.
Continual learning and changing
requirements based on new insights
also means continual change.
Weighing which approaches would
be most helpful and deciding how to
prioritize between different issues
can benefit from decision support.
Organizations have to agree on
what types of demographics, biases,
and stakeholders to consider in
their algorithmic-bias efforts. In
music contexts, genres and local
music cultures are important
considerations, while they may be less
relevant in other contexts. Trade-offs
between different stakeholders, or
even different user demographics,
may be necessary; it may not be
obvious what the right outcome is.
Communicate the minimal viable
product steps. Algorithmic bias
to a certain extent can be seen as
technical debt. Bias is much easier
to tackle when working with a new
product rather than one that has
been running for a while, or where
a variety of models are working
together. Unintended biases are
self-reinforcing, recursive, and much
harder to eliminate if ignored at the
beginning. However, in the early
stages of product development,
as well as in the development of
startups, scaling is not an option.
Target users may change over time.
Datasets may be limited by necessity,
as no user data is available. This
also means that the data necessary
to even assess algorithmic bias and
which demographics would need
additional attention is often not
available. Quality evaluations may be
dependent on the knowledge available
at that early time. Prioritizing such
speculative endeavors can be at odds
with agile development and its rapid
delivery toward specific user stories
and maximum impact with minimal
investment. Addressing algorithmic
bias in product development thus
may require short-term narrow steps,
with continual improvement paths
forward. It may also require taking
time rather than “moving fast and
breaking things.”
Education. Structuring
algorithmic-bias assessment
methods in a manner that fits into
prioritization processes in teams may
be more fruitful than an approach
that is presented as extra work. This
product. Getting organized using a
shared framework becomes even more
important to be able to deal with such
cases.
From checklist to case study: voice.
Beyond general methods, we also need
methods for specific types of products
and specific issues we encounter in
practice. One such example is voice
applications. While voice isn’t new,
it is relatively new to be used at scale
in diverse applications. In specialized
and creative domains, this leads to
issues. For example, in music, artists
may name themselves with symbols
(e.g., M SC RA) that most standard
automated speech recognition (ASR)
systems cannot transcribe. Some
have alternative spellings, such
as the track “Hot in Herre” or the
artist 6LACK, pronounced “black.”
Code switching between languages
is also common, but most ASR
systems are trained on data from one
main locale. Especially when wide
developer communities start to rely
on standard, not domain-specific
ASR APIs trained on large-scale data
from other application types, this can
have serious consequences. Certain
types of content become much less
accessible (Figure 4).
To get a firmer grasp on potential
issues, we developed a way to detect
such content at scale by detecting
unexpected differences in content
engagement, or popularity, between
modalities [ 10]. Categorization
of underserved content resulted
in a typology of content types of
linguistics practices that can make
content harder to surface. For music,
we found that genres such as hip-hop
and country were disproportionately
affected, as creative language
usage and code switching between
languages are a valued part of these
subcultures. After detection, it is
possible to correct these issues by
collecting multiple pronunciations
from users without pre-knowledge,
or via crowdsourcing, and have
these transcribed. Artists’ intended
pronunciation and spelling may not
match those of users, crowdworkers,
or the ASR transcription.
Crowdsourced pronunciations are
thus not guaranteed to include the
right pronunciation per se, but they
do provide a scalable and relatively
easy way to correct some inaccessible
content.
More complex models would have
been possible to address each of the
11 types of issues we found. However,
these would require more resources
and changes to infrastructure. This
points to the need to value not just
the most complex model literature,
but also easy-to-apply solutions and
sociotechnical work to understand
team challenges.
IMPACTING PRODUCTS:
ORGANIZATIONAL
COMMUNICATION
AND EDUCATION
Developing methods to address
biases is not enough: Considerable
organizational work may have to
be done as well. While cultural
changes are needed to address biases
early on, a deep understanding
of the challenges that teams face
in their day-to-day practice will
help make algorithmic-bias efforts
more successful. This includes
developing lightweight tools
that support decision making,
ensuring that methods fit rapid-delivery engineering practices, and
breaking down the daunting task
of addressing algorithmic bias into
smaller pieces. It also includes a
conscious effort to educate and to
iterate on potential tools provided
to teams. In larger organizations,
it means both resourcing research
efforts as well as evangelization and
supporting teams in their specific
questions.
Aligning on priorities and
interdependencies. Teams have
competing demands and must deal
with changing circumstances. In
agile contexts, teams may be self-
Unintended biases are self-reinforcing,
recursive, and much harder to eliminate
if ignored at the beginning.