various relevance estimates so as to
improve overall relevance estimation
that guides ranking. Neural-network-based retrieval methods estimate relevance (often probability) by learning
document and query representations
and/or integrating multiple relevance
signals. 28 As feature-based approaches
they can be viewed as obeying the PRP.
The PRP was shown to be optimal under some assumptions; for example,
the independence of document relevance from that of others. 33
Post-ranking dynamics. Careful
examination of the PRP, and retrieval
methods based on the PRP, reveals a
major gap: post-ranking effects are
not accounted for; specifically, changes in the corpus documents made by
incentivized publishers that respond
to the induced ranking so as to improve the ranking of their documents.
We focus on publishers (content providers) as those affected by, and affecting, the dynamics. We do not consider
the dynamics driven by users of the
systems, for example, via clickthrough
actions. Later in the article, we discuss the importance, challenges, and
potential future directions of accounting simultaneously for publisher and
user-driven dynamics.
To highlight the importance of ac-
counting for post-ranking effects on
corpus content consider the following
example. 2 A publisher writes a docu-
ment that contains both common in-
formation and information which is
unique to the document; that is, this
unique information cannot be found
in any other document in the cor-
pus. The publisher, for some reason,
is more interested in her document
being highly ranked for queries that
touch on the non-unique information.
However, the ranking function “penal-
izes” documents having a mixture of
information (that is, not being focused
on the information sought for in these
queries). In terms of the PRP and cur-
rent retrieval paradigms the approach
of the ranking function is justifiable:
one can assume that the user who
posted the query would prefer read-
ing documents that are focused on in-
formation related to the query rather
than having to wade through poten-
tially long documents and distill the
sought information. Now, suppose
that the publisher decides to remove
We discuss some of the fundamen-
tal, and far reaching, modeling and al-
gorithmic implications of the competi-
tive and incentivized settings in which
search engines operate. Specifically, we
examine post-ranking corpus effects
and survey our work on a game-theo-
retic analysis of the strategic behavior
of publishers (content providers) and
on addressing ranking robustness. We
also briefly describe the challenges of
empirical evaluation and analysis in
competitive search settings. Later, we
focus on our theoretical work on the
suboptimality in competitive settings
of the most basic search (ranking) and
recommendation principle and we sur-
vey our work on settings where incen-
tivized parties are ranking algorithms
and prediction algorithms (regression)
that compete with each other.
Search Engines
The main goal of search engines is to
rank documents in a corpus by their
presumed relevance to the information need expressed by a query posted
by a user. There has been a huge body
of work on devising relevance estimates for documents. For example,
the similarity between term-based
vectors representing documents and
queries serves as a relevance estimate in the well-known vector space
model. 35 Modern relevance ranking
functions are learned (trained) using
a training set which contains queries
and corresponding relevance judgments of documents. 25, 28 These approaches allow to integrate various
types of relevance estimates.
The theoretical foundation of virtually all ad hoc retrieval paradigms, classical and modern, is the probability
ranking principle (PRP): 33 Documents
are ranked by the probability that they
are relevant to the query wherein the
probability is estimated using all the
information available to the search
system. The PRP is the direct basis of
the probabilistic retrieval approach20
and was shown to underlie the basic
language modeling approach. 23
Feature-based learning-to-rank methods
can also be viewed as obeying the PRP,
even if relevance probability is not
directly estimated. That is, these approaches are trained to minimize ranking loss with respect to ground truth
ranking and they essentially integrate
We believe
the design
of search and
recommendation
methods must be
revolutionized to
account for the
strategic incentives
of the various
parties that affect
(or are affected by)
the system.