NOVEMBER 2017 | VOL. 60 | NO. 11 | COMMUNICATIONS OF THE ACM 85
strategy caps the betting less than 0.01%, and the hand most
likely to cap is a pair of twos, with probability 0.06%. Perhaps
more importantly, the strategy chooses to play, that is, not
fold, a broader range of hands as the non-dealer than most
human players (see the relatively small amount of red in
Figure 4b). It is also much more likely to re-raise when hold-
ing a low-rank pair (such as threes or fours).g
While these observations are only for one example of
game-theoretically optimal play (different Nash equilibria
may play differently), they both confirm as well as contradict
current human beliefs about equilibria play, and illustrate
that humans can learn considerably from such large-scale
5. IN-GAME RESULTS
In this extended version of the original paper,
9 we present
additional results measuring Cepheus’ in-game performance against computer agents and human opponents.
HULHE has served as a common testbed for artificial intelligence research for more than a decade, and researchers
have produced a long series of computer agents for the
domain. This effort was largely coordinated by the Annual
Computer Poker Competition (ACPC) which begain in 2006
with HULHE. While each year’s top agents outperformed the
older agents in the competition, and so appeared to be converging to optimal play, their actual worst-case exploitability
was unknown. In 2011, an efficient best response technique
was developed that made it feasible to measure a computer
23 and for the first time researchers
were able to exactly measure their progress towards the goal
of solving the game. A key result in that paper was that top
ACPC agents only defeated each other by tiny margins, and
yet had a wide range of exploitability. Using Cepheus, we can
now also evaluate these historical agents through matches
against an essentially optimal strategy.
of Random Access Memory (RAM), and a 1TB local disk.
We divided the game into 110,565 subgames (partitioned
based on preflop betting, flop cards, and flop betting). The
subgames were split among 199 worker nodes, with one
parent node responsible for the initial portion of the game-tree. The worker nodes performed their updates in parallel,
passing values back to the parent node for it to perform its
update, taking 61 min on average to complete one iteration.
The computation was then run for 1,579 iterations, taking
68. 5 days, and using a total of 900 core years of computationf
and 10.9TB of disk space, including file system overhead
from the large number of files.
Figure 3 shows the exploitability of the computed strategy with increasing computation. The strategy reaches an
exploitability of 0.986mbb/g, making HULHE essentially
weakly solved. Using the separate exploitability values for
each position (as the dealer and non-dealer) we get exact
bounds on the game-theoretic value of the game: between
87.7mbb/g and 89.7mbb/g for the dealer, proving the common wisdom that the dealer holds a significant advantage
The final strategy, as a close approximation to a Nash
equilibrium, can also answer some fundamental and long-debated questions about game-theoretically optimal play in
HULHE. Figure 4 gives a glimpse of the final strategy in two
early decisions of the game. Human players have disagreed
about whether it may be desirable to “limp,” that is, call as
the very first action rather than raise, with certain hands.
Conventional wisdom is that limping forgoes the opportunity to provoke an immediate fold by the opponent, and so
raising is preferred. Our solution emphatically agrees (see
the absence of blue in Figure 4a). The strategy limps just
0.06% of the time and with no hand more than 0.5%. In other
situations, the strategy gives insights beyond conventional
wisdom, indicating areas where humans might improve.
The strategy rarely “caps,” that is, makes the final allowed
raise, in the first round as the dealer, whereas some strong
human players cap the betting with a wide range of hands.
Even when holding the strongest hand, a pair of aces, the
f The total time and number of core years is larger than was strictly necessary
as it includes computation of an average strategy that was later measured to
be more exploitable than the current strategy and so discarded. The total
space noted, on the other hand, is without storing the average strategy.
Computation time (CPU-years)
Figure 3. Exploitability of the approximate solution with increasing
(a) First action as the
(b) First action as the
non-dealer after a dealer raise
Figure 4. Action probabilities in the solution strategy for two early
decisions. Each cell represents one of the possible 169 hands (i.e.,
two private cards) with the upper diagonal consisting of cards
with the same suit and the lower diagonal consisting of cards of
different suits. The color of the cell represents the action taken: red
for fold, blue for call, and green for raise, with mixtures of colors
representing a stochastic decision.
g These insights were the result of discussions with Mr. Bryce Paradis, previously
a professional poker player who specialized in HULHE.