the unique information from the document so as to obtain higher ranking
for queries pertaining to the common
information. This means that valuable information in the corpus is lost,
and search effectiveness for queries
which target the unique information
will consequently degrade. Overall,
the potential satisfaction of users’
information need with respect to the
corpus will decrease.
Indeed, we showed, using a game
theoretical analysis, that the PRP is
suboptimal as it does not promote
content breadth in the corpus in
terms of topical coverage. 2 In other
words, retrieval methods based on the
PRP do not provide a strong enough
incentive for publishers to produce diversified content. It also turns out that
introducing randomness to ranking
functions can help to promote content breadth in the corpus, and hence,
increase the overall attainable utility
for search engine users. We will discuss the sub-optimality of the PRP and
the merits of using nondeterministic
ranking functions later.
Given the significant impact of induced rankings on the content in the
corpus, due to the actions employed
by incentivized publishers who respond to these rankings, the natural
challenge that rises is analyzing the
strategic behavior of publishers. Previous work has characterized search
engine optimization (SEO) techniques
intended to promote documents in
rankings. 15 We also note that content
dynamics on the Web was studied and
analyzed (for example, Raifer et al. 32
and Santos et al. 36). However, the post-ranking perspective has not been actually modeled or analyzed; that is, the
specific types of responses of publishers to rankings, and more generally,
their ranking-motivated strategies
in terms of document manipulation,
were not studied.
Strategic publishers. Since the early
days of the Web, different types of SEO
techniques have been identified. 15
For example, publishers can “stuff”
keywords in their Web pages so as to
promote them in rankings induced for
queries that include these keywords.
The underlying (often correct) assump-
tion is that increased similarity be-
tween the query and the Web page in-
creases the retrieval score of the page,
There is a fundamentally important
question that goes beyond the actual
general actions that publishers use
as part of their SEO efforts: What is
the strategic behavior of the publish-
ers with respect to induced rankings?
In other words, given that they do not
know what the ranking function is,a
but they can observe past rankings
induced for queries of interest, what
would be an effective response strat-
egy to rankings?b We have recently ad-
dressed this question using a game
theoretic analysis. 32 Our main theo-
retical result was that a “worthwhile”
strategy for publishers who want to
promote their documents in rankings
induced for a specific query is to make
their documents become more simi-
lar to those highly ranked in the past
for the query. By “worthwhile” we refer
to the fact that this strategy results in
a certain equilibrium in a game-theo-
retic modeling of the retrieval setting
wherein publishers are players and
the ranking function is a mediator, not
exposed to the players, which induces
rankings for queries.
The intuitive theoretical finding that
mimicking the “winner” from previous
rankings is worthwhile was supported
by analyzing the strategic publishing
behavior of students who served as
publishers in a content-based ranking
competition we organized. We provide
details of this competition later.
Addressing unwarranted effects of
the dynamics. As discussed, a major
part of the dynamics of a corpus in
competitive retrieval settings is driven
by incentivized publishers. The dynamics can have undesirable effects, specifically, in terms of degrading retrieval
effectiveness. For example, some Web
pages could be spam that is intended
a There are efforts to reverse engineer ranking
functions but these are often of quite limited
success.
b A response in terms of manipulating docu-
ment content can be, for example, at the
“micro-level:” selecting the terms to add or re-
move, or at the “macro-level:” making a docu-
ment more similar to another document. 32
to be promoted in rankings and to at-
tract clicks—that is, black-hat SEO. 15
At the same time, corpus dynamics is
driven to a major extent by white hat
SEO efforts which need not necessarily
hurt retrieval effectiveness; these are
legitimate actions applied to Web pag-
es so as to promote them in rankings.
However, even white-hat SEO can lead
to undesirable effects; for example,
rapid changes to the relative ranking of
documents due to indiscernible docu-
ment manipulation. As a result, users
might respond by consistently refor-
mulating their queries or simply losing
faith in the search engine.
Ranking robustness: A blessing or a
curse? Following the arguments just
posed, one should presumably opt
for ranking robustness; that is, small
indiscernible changes of documents
should not result in major changes to
induced rankings. 14 Some support for
this argument can be drawn using one
of the most fundamental hypotheses
in the field of information retrieval,
namely, the cluster hypothesis18 as we
recently described. 14 Additional support can be drawn from arguments
made in work on adversarial. classification, specifically, in the vision domain. The main premise in this line
of work was that small indiscernible
(adversarial) changes of objects should
not result in changes to classification
decisions. 11, 13 Thus, while changes
of rankings in our setting that reflect
discernible changes of, or differences
between, documents are naturally very
important, changes of rankings that
are “harder to explain” (a.k.a. explainable IR) are less warranted.
On the other hand, there are arguments for hurting ranking robustness,
to some extent, for special purposes.
For example, recent work showed that
introducing randomization to induced
rankings along time, which hurts robustness, can increase fairness with
respect to publishers of Web pages. 7
Specifically, given that users browsing
the search results page mainly pay attention to the top ranked results, 19 allowing Web pages to be positioned at
the highest ranks even if they are not
among the most relevant, can increase
the attention given to other publishers and hence promote fairness. Another justification for hurting ranking robustness is promoting content