Figure 5 presents the exploitability of our historical
agents and their average loss in games played against
Cepheus. To reduce the impact of luck, a duplicate poker
format was used where each game is played twice, using
the same cards, but with the players in opposite positions.
PsOpti4 was the first game theoretic strategy produced for
HULHE, and was also the University of Alberta entry to
the 2006 ACPC.
5, h The University of Alberta entries to the
ACPC were named Hyperborean, and from 2007 onwards,
all were created using variants of CFR.i The Polaris 2007
and 2008 agents were created by the University of Alberta
for its two Man-vs.-Machine Poker Championship matches,
in which Polaris narrowly lost in 2007 and narrowly won in
2008; an analysis of these matches is available in24 [Chapter
8]. Finally, the CFR-BR agent was our closest equilibrium
approximation prior to this work.
21 It used the same abstract
game as Hyperborean 2011, but used an algorithm that
solved for the abstract strategy with the lowest real game
These results show that, with the exception of Hyper-
borean 2009, each new generation of strategies improved in
both exploitability and in loss against an essentially optimal
strategy. However, even though many of these strategies were
highly exploitable, the rate at which they lose to Cepheus is
quite low. This loss is difficult to measure with statistical
confidence: a 100,000 game (non-duplicate) match would
have a 95% confidence interval of 31mbb/g, larger than the
performance difference between Cepheus and every agent
but PsOpti4. Further, Hyperborean 2009 did improve over
its predecessors in terms of in-game performance against
Cepheus, and regressed in exploitability due to its use of
“Strategy Grafting,” an unsound solving technique that
solves an abstraction as a series of fragments.
50 This tech-
nique allows for a much larger and finer grained abstraction
than would otherwise be feasible, resulting in improved in-
game performance, but without theoretical guarantees on
exploitability. Together, these results illustrate the difficulty
in evaluating a strategy only through its competition perfor-
mance, instead of calculating its exploitability.
We can also measure Cepheus’ performance against
human adversaries. After this article was first published in
January 2015, our website allowed visitors to play against
Cepheus and inspect its strategy.
8 Each visitor chose a user-name and played any number of short 100-game matches
against Cepheus. Over the last two years, 39,564 unique
usernames have played 98,040 matches, with 3,564,094 total
games played.j Over this set of games, Cepheus is winning
at a rate of 169.9 ± 5.2mbb/g with 95% confidence. However,
most of the players did not finish a single 100-game match
(only 7,878 players did so, with 20,374 completed matches in
total), and so this winrate is likely not reflective of Cepheus’
performance against strong opponents.
Determining which of these players are strong is nontrivial because of both variance in their matches, and the
unequal amount of games played by each player. While both
luck and skill contribute to a player’s performance, the high-est-scoring players are more likely to be the luckiest rather
than the strongest. Additionally, bias may be introduced if
players keep playing while ahead, but quit if they are losing.
In order to limit the impact of bias and evaluate Cepheus’
performance against different tiers of humans, we used
the following method. First, we eliminated usernames with
insufficient data that had played fewer than 500 games, leaving 821 usernames playing 33,752 matches with 1,765,656
games. Next, we divided each username’s games into two
sets, called Rank and Test.k Each username’s Rank games
were evaluated, and the resulting winrates were used to sort
the players by performance. This ordering reflected both
their skill and luck. The players were then divided equally
into five tiers: the bottom 20% of usernames, 21–40% etc.
Within each tier, the Test game results were averaged to produce a winrate for the tier, independent from the luck that
affected the Rank games.
These results are shown in Figure 6. Cepheus’ estimated winrate varies from 225 to 87mbb/g as we advance
through the tiers, decreasing as the quality of the human
PsOpti4-03 Hyper-07 Polaris-07 Hyper-08 Polaris-08 Hyper-09 Hyper- 10 Hyper-11CFR-BR- 12 Cepheus- 14
vs best response
Name Year Exploitability Cepheus
PsOpti4 2003 – 74. 9 ± 23. 7
Hyperborean 2007 2007 298.106 27. 4 ± 2. 9
Polaris 2007 2007 275.880 26. 2 ± 3.0
Hyperborean 2008 2008 266.797 22. 5 ± 2. 7
Polaris 2008 2008 235.294 22. 2 ± 2. 6
Hyperborean 2009 2009 440.823 18. 9 ± 2. 6
Hyperborean 2010 2010 135.427 10. 8 ± 2. 5
Hyperborean 2011 2011 106.035 8.0 ± 2. 4
CFR-BR 2012 37.113 9. 2 ± 2. 6
Cepheus 2014 0.986 0
Figure 5. Exploitability and performance against Cepheus for earlier
computer strategies. Results are in mbb/g, and indicate the expected
winnings by the strategy’s opponent (a best response or Cepheus,
respectively). The Cepheus matches involved 1mn games of
duplicate poker (2mn games total), except for PsOpti4 which played
20,000 duplicate games ( 40,000 games total).
h PsOpti4 acts too slowly for an exploitability calculation to be practical,
or for a long match against Cepheus.
i In the inaugural 2006 ACPC, PsOpti4 was the core component of
j Many players quit before finishing the 100-game match.
k In each block of four sequential games, one pair (played in each position)
was assigned to each set.