In a billion-dollar market, this difference could be dramatic.
DO HUMANS MINIMIZE REGRET?
Recognizing the classic Nash equilibrium may be too strict to describe
practical environments, other solution concepts have been suggested
as alternatives for analyzing game
outcomes. A particularly attractive
alternative for the repeated game
are the no-regret learning outcomes,
which assume players minimize regret in the specific sense used in
the regret-minimization literature
[ 5] (sometimes known as “Hannan
consistent”). This notion assumes
players use some no-regret strategies by which, over time, they manage to achieve at least as much utility as they could have gotten from
playing any fixed action repeatedly.
This assumption is strictly weaker
than the Nash equilibrium assumption. Certainly, if the players reach a
Nash equilibrium, they must all be
minimizing their regrets. But the
holds even if the players reach any
equilibrium from the much wider
families of correlated or even coarse-correlated equilibria (where all players have non-positive regrets), and
furthermore, there are many natural
dynamics that are known to indeed
minimize regret in the long run.
3]. Thus, according to these theoretical
predictions, it is not clear why the truthful VCG auction—where there is no
need for strategic manipulations and it
is safe for a player to bid her true value—
is not commonly used in ad auctions.
In order to shed some light on the
relevance of the theoretical predictions to human players, we experimentally evaluated how people behave in GSP and VCG ad auctions in
controlled lab simulations (see Noti
et al., [ 4]). In each experiment session, five human participants simulated the roles of advertisers, and
competed in a sequence of 1,500 auctions (all either GSP or VCG) for five
ad positions presented in decreasing
order of (commonly known) CTRs.
The participants could modify their
bids at any time, and every second an
auction was performed with the current set of five bids. Each player was
assigned a “type” at random, which
was the monetary value she obtained
from each “click” on her ad. As in real
auctions, players did not know the
values of the others nor the bids that
the others submitted. The income
and payment of every auction were
updated for every player’s private balance, and the players knew that at the
end of the session they would be paid
proportionally to their final balance.
For the full details of the experiment,
see Noti et al. [ 4].
Among the various behaviors we
observed, some were in line with theory. For example, bids in the VCG auction were on average close to the true
values, while in the GSP auction bidders did, as expected, bid lower than
their value. In addition, both auction
mechanisms achieved high levels of
social welfare, indicating, in general,
the auction efficiently allocated the
better positions to the players with the
However, other results of our experiment clearly deviated from the rational predictions in significant aspects.
Most notably, there seemed to be no
convergence to Nash equilibrium, and
in fact the frequency in which players modified their bid increased with
time. Moreover, bidders in VCG were
not truthful—even when they were explicitly given their own valuation and
the truthfulness property was also explained, only less than 20 percent of
the bids were the true values. Finally,
looking at the bottom line of how the
“pie” (i.e., the social welfare) was divided between the advertisers and the
search engine, we found the search
engine captured a total of 76 percent
of the social welfare in GSP sessions
compared to 68 percent in VCG sessions. This is in contrast to 63 percent
the search engine would have captured if players played according to
the theoretically predicted outcome.
Figure 1: The regret results of the human players in the ad auction experiment of Noti et al. (2014), according to player types
(values of 21, 27, 33, 39, and 45 “game coins” per click).
(a) Relative momentary regret over time (b) Relative total regret
0 5 10 15 20 25
21 27 33 39 45