teams who own data-driven platforms
or features are continuously faced
with decisions about data collection,
maintenance, and modeling. In
order for algorithmic-bias efforts
to succeed, teams must have
support for making these decisions
thoughtfully and intentionally. It
is also crucial to assess data quality
and understand the appropriate
measures of effectiveness. At least
three complementary types of effort
are required (Figure 1). The first is
research and analysis to know how to
assess and address bias. This involves
translating both existing research into
the organizational context, as well as
case studies into specific products.
The second is developing processes
that are easy to integrate into
existing product cycles. This requires
organizational work: education
and organizational coordination.
The third is engaging with external
communities to exchange lessons
learned and ensure that the work
done internally keeps up with the
state of the art. Each of these come
with specific challenges that have
to be addressed within the product,
technical, and organizational
contexts.
PILOTING GENERAL AND
FEATURE-SPECIFIC METHODS
Teams need concrete recommendations and methods that fit specific
contexts. This includes general
methods or checklists that are shared
across organizations, and applica-tion- or modality-specific methods
(Figure 2). Checklists provide a
shared framework for how to approach and communicate about data
and outcome characteristics, and
provide the foundation for a structured approach across teams.
General methods: from research
to checklist. Very human decisions
affect machine-learning outcomes. A
team’s success criteria—which data
to collect, which models to use, and
which people to involve in quality
assessment—can all be steered but
are not always fully planned out.
To allow teams to make the right
decisions, they must know what
questions to ask. In our case, we
evaluated a selection of existing bias
frameworks (including [ 1, 8, 9]) and
translated them into an easy-to-digest
summary of bias types, while adding
within our own organization as a
start to assessing both intended
data and algorithm characteristics
and unintended unfair biases. This
includes a checklist of questions
teams can ask while making
decisions in data collection,
modeling decisions, and assessing
outcomes, as well as a product case
study involving voice. We discuss
lessons learned along the way and
offer suggestions for others seeking
guidelines and recommendations for
best practices for the mitigation of
algorithmic bias and negative effects
of ML in consumer-facing product
development.
TRANSLATING
RESEARCH INTO PRAGMATIC
DECISION SUPPORT
Over the past several years, machine
learning (ML) has developed into
a ubiquitous and powerful tool. By
finding patterns in thousands or
millions of training examples, ML
systems help us keep our inboxes
tidy by identifying spam, provide
personalized recommendations for
content to consume and buy, and even
analyze medical images to potentially
help doctors detect cancer. Bias in
these contexts refers to an inherent
skew in a model that causes it to
underperform in a systematic way.
In research on algorithmic fairness,
bias is sometimes defined as unfair
discrimination: negative impacts of
ML efforts that unfairly (dis)favor
particular groups or individuals.
Alternatively, it can be framed as
the characteristics—the biases—a
system and its data have, some
intended and some unintended. In
the latter interpretation, all data
and algorithms have biases. Curated
datasets are always an approximation
to the true underlying distribution.
The challenge for product teams
then is to make decisions that lead
to intended behavior, and both
foresee and counter unintended
consequences. Research is not enough
to make this happen in practice; we
need to bridge the gap between the
academic literature and product
decisions.
While general awareness of the
power and reach of ML, the reliance
of ML on data, and the possibility
of bias has increased recently in
the media, these issues have been
discussed for over 20 years within
HCI and related areas. Batya
Friedman and Helen Nissenbaum
[ 1] presented a prescient taxonomy
of biases in computational systems
in 1996. Since then, the body of
literature has been steadily growing.
Research communities spanning
industry and academia, such as
FATML [ 2], have matured into
dedicated FAT* venues [ 3]. Microsoft,
Google, and Facebook have all
announced research into algorithmic
bias. Initiatives such as AI Now [ 4]
are gaining deserved attention with
guidelines for algorithmic impact
assessment. Research organizations
like Data & Society [ 5] and mainstays
like the World Wide Web Foundation
[ 6] and ACM are presenting
accountability primers and helpful
principles. In the wake of Europe’s
cookie and GDPR legislation, the
unforeseen consequences of machine
learning have also received political
attention.
A proactive approach is important,
but it can be difficult to operationalize
literature in step-by-step processes
and reconcile methods with gritty
on-the-ground demands. As ML
becomes more ubiquitous, guidelines
for practitioners such as Gebru et al.’s
datasheets [ 7] become increasingly
relevant. Developing processes to
assess, and certainly to address,
issues within organization-specific
contexts still requires work. Product
A proactive approach is important,
but it can be difficult to operationalize
literature in step-by-step processes
and reconcile methods with
gritty on-the-ground demands.