a coarse correlated equilibrium, coarse correlated equilibria
are “even easier” to compute and learn, and are thus a still
more plausible prediction for the realized play of a game.
We now give our extension theorem for equilibrium
concepts in one-shot games: every POA bound proved via a
smoothness argument extends automatically to the set of
coarse correlated equilibria. With the “correct” definitions
in hand, the proof writes itself.
di(st) can be positive or negative. We can mimic the deriva-
tion in ( 3)–( 5) to obtain
for each t.
Suppose that every player i experiences vanishing average
(external) regret, meaning that its cost over time is competitive
with that of every time-invariant strategy:
Theorem 3. 1 (Extension Theorem—Static Version)
For every cost-minimization game G with robust POA r (G), every
coarse correlated equilibrium s of G, and every outcome s of G,
es σ[C(s)] ≤ r (G) · C(s*).
Repeating the same pure Nash equilibrium over and over
again yields a degenerate example, but in general such
sequences can exhibit highly oscillatory behavior over arbitrarily large time horizons (see, e.g., Blum et al.
3 and Kleinberg
Averaging ( 17) over the T time steps and reversing the
order of the resulting double summation yields
Proof: Let G be a (λ, m)-smooth cost-minimization game, s
a coarse correlated equilibrium, and s an outcome of G. We
Recalling from ( 16) that di(st) is the additional cost incurred
by player i at time t due to playing strategy st i instead of the
(time-invariant) strategy s*i, the no-regret guarantee ( 18)
implies that is bounded above by a term
that goes to 0 with T. Since this holds for every player i,
inequality ( 19) implies that the average cost of outcomes
in the sequence is no more than the robust POA times the
minimum-possible cost, plus an error term that approaches
zero as T → ∞.
where equality ( 10) follows from the definition of the
objective function, equalities ( 11), ( 13), and ( 15) follow from
linearity of expectation, inequality ( 12) follows from the
definition ( 9) of a coarse correlated equilibrium (applied
once per player, with the hypothetical deviation s*i ), and
inequality ( 14) follows from the assumption that the game is
(λ, m)-smooth. Rearranging terms completes the proof.
Theorem 3. 2 (Extension Theorem—Repeated Version)
For every cost-minimization game G with robust POA r(G), every
outcome sequence s
1, . . . , s T that satisfies ( 18) for every player,
and every outcome s of G,
asT → ∞.
Blum et al.
3 were the first to consider bounds of this type,
calling them “the price of total anarchy.”
We reiterate that the type of bound in Theorem 3. 2 is sig-
nificantly more compelling, and assumes much less from
both the game and its participants, than the one that applies
only to Nash equilibria. While Nash equilibria can be intrac-
table or impossible to find, there are several computation-
ally efficient “off-the-shelf” learning algorithms with good
convergence rates that are guaranteed, in any game, to gen-
erate outcome sequences with vanishing average regret (see,
e.g., Chapter 4 of Nisan et al.
17). Of course, the guarantee in
Theorem 3. 2 makes no reference to which learning algo-
rithms (if any) the players’ use to play the game—the bound
applies whenever repeated joint play has low regret, whatever
3. 2. Repeated play and no-regret sequences
The extension theorem (Theorem 3. 1) applies equally well
to certain outcome sequences generated by repeated play,
because of a well-known correspondence between such
sequences and static equilibrium concepts. To illustrate
this point, consider a sequence s1, s2, . . . , s T of outcomes of a
(λ, m)-smooth game and a minimum-cost outcome s of the
game. For each i and t, define
as the hypothetical improvement in player i’s cost at time t
had it used the strategy s i in place of st i . When st is a Nash equilibrium, di(st) cannot be positive; for an arbitrary outcome st,