nation—advertiser bias in placing ads
or society bias in selecting ads.
Discrimination, however, is at the
heart of online advertising. Differential delivery is the very idea behind it.
For example, if young women with children tend to purchase baby products
and retired men with bass boats tend
to purchase fishing supplies, and you
know the viewer is one of these two
types, then it is more efficient to offer ads for baby products to the young
mother and fishing rods to the fisherman, not the other way around.
On the other hand, not all discrimination is desirable. Societies have
identified groups of people to protect
from specific forms of discrimination.
Delivering ads suggestive of arrest
much more often for searches of black-identifying names than for white-identifying names is an example of
unwanted discrimination, according
to American social and legal norms.
This is especially true because the ads
appear regardless of whether actual arrest records exist for the names in the
The good news is that we can use the
mechanics and legal criteria described
earlier to build technology that distin-guishes between desirable and undesirable discrimination in ad delivery.
Here I detail the four key components:
1. Identifying Affected Groups. A set
of predicates can be defined to identify
members of protected and comparison
groups. Given an ad’s search string and
text, a predicate returns true if the ad
can impact the group that is the subject of the predicate and returns false
otherwise. Statistics of baby names can
identify first names for constructing
race and gender groups and last names
for grouping some ethnicities. Special
word lists or functions that report degree of membership may be helpful for
In this study, ads appeared on
searches of full names for real people,
and first names assigned to more black
or white babies formed groups for testing. These black and white predicates
evaluate to true or false based on the
first name of the search string.
2. Specifying the Scope of Ads to Assess. The focus should be on those
ads capable of impacting a protected
group in a form of discrimination prohibited by law or social norm. Protec-
is at the heart of
the very idea
tion typically concerns the ability to
give or withhold benefits, facilities, services, employment, or opportunities.
Instead of lumping all ads together, it
is better to use search strings, ad texts,
products, or URLs that display with ads
to decide which ads to assess.
This study assessed search strings
of first and last names of real people,
ads for public records, and ads having
a specific display URL (
instantcheckmate.com), the latter being the most
informative because the adverse ads all
had the same display URL.
Of course, the audience for the ads
is not necessarily the people who are
the subject of the ads. In this study, the
audience is a person inquiring about
the person whose name is the subject
of the ad. This distinction is important when thinking about the identity
of groups that might be impacted by
an ad. Group membership is based on
the ad’s search string and text. The audience may resonate more with a distinctly positive or negative characterization of the group.
3. Determining Ad Sentiment.
Originally associated with summarizing
product and movie reviews, sentiment
analysis is an area of computer science
that uses natural-language processing and text analytics to determine the
overall attitude of a writing. 13
Sentiment analysis can measure whether an
ad’s search string and accompanying
text has positive, negative, or neutral
sentiment. A literature search does not
find any prior application to online ads,
but a lot of research has been done assessing sentiment in social media (sen-
timent140.com, for example, reports
the sentiment of tweets, which like advertisements have limited words).
In this study, ads containing the
word arrest or criminal were classified as
having negative sentiment; ads without
those words were classified as neutral.
4. Testing for Adverse Impact.
Consider a table where columns are comparative groups, rows are sentiment,
and values are the number of ad impressions (the number of times an
ad appears, though the ad is not necessarily clicked). Ignore neutral ads.
Comparing the percentage of ads having the same positive or negative sentiment across groups reveals the degree
to which one group may be impacted
more or less by the ad’s sentiment.