ta) both defeating human professionals using variants of CFR—although a
near-perfect solution remains out of
reach. The final challenge would be
the variant most widely played by humans: multiplayer no-limit poker.
The methods used to solve poker
are quite general, and therefore have
potential applications far beyond this
one game. Many other imperfect information games played by humans,
including a wide variety of card games,
board games, and video games, are
tractable to these methods. Furthermore, there are many real-world applications, such as auctions, negotiations, and security, in which agents
receive different information, and
must make a sequence of decisions to
maximize a final pay-off—and therefore belong to the same class of imperfect information games as HULHE.
Solving a problem attains perfection in one domain. The frontier of
solved domains is an incontrovertible
measure of current computer capabilities. That frontier has now been
extended by one significant step, to
include for the first time a challenging
imperfect information game.
David Silver leads the reinforcement learning research
group at Google DeepMind, London, and is lead researcher
Copyright held by author.
THE STUDY OF GAMES is as old as computer science itself. Babbage, Turing,
and Shannon devised algorithms and
hardware to play the game of chess.
Game theory began with questions
regarding optimal strategies in card
games and chess, later developed into
a formal system by von Neumann.
Chess subsequently became the
drosophila—or common fruitfly, the most
studied organism in genetics—of artificial intelligence research. Early
successes in chess and other games
shaped the emerging field of AI: many
planning algorithms first used in
games became pillars of subsequent
research; reinforcement learning was
first developed for a checkers playing program; and the performance of
game-playing programs has frequently been used to measure progress in
Most of this research focused on
perfect information games, in which
all events are observed by all players,
culminating in programs that beat
human world champions in checkers, chess, Othello, backgammon, and
most recently, Go. However, many
applications in the real world have
imperfect information: each agent
observes different events. This leads
to the possibility of deception and a
wealth of social strategies. Imperfect
information games provide a microcosm of these social interactions,
while abstracting away the messiness
of the real world.
Among imperfect information
games, Poker is the most widely stud-
ied—the latest drosophila—due to
its enormous popularity and strate-
gic depth. The smallest competitively
played variant by humans, and the
most widely played by computers, is
the two-player game known as Heads-
Up Limit Hold’Em (HULHE), in which
each player holds two private cards in
addition to five public cards. Two de-
cades of research in this game has led
to powerful methods, such as counter-
factual regret minimization (CFR), for
approximating a Nash equilibrium.
Several years ago, a program called Polaris—created by many of the authors
of the following paper—defeated for
the first time a human professional
poker player in HULHE.
However, Polaris was still far from
perfect; indeed, it turns out in retrospect that it was exploitable, due to
the approximations it made, by a very
large margin. The obvious remaining
question was whether a “near- perfect” solution could be found—a strategy so close to a Nash equilibrium
that it cannot be differentiated in a
lifetime of play.
The following paper takes the CFR
methods used in previous work to the
next level. Using a number of innovations—and several hundred machine-years of computation—they were
able to find a near-perfect solution to
HULHE. Their solution also provides
insights into the game itself, showing
the dealer holds a significant advantage, and that seemingly poor hands
should be played surprisingly often.
For the game of poker, the next
step beyond HULHE is no-limit poker,
which has a much larger action space.
This too has recently been cracked,
with the programs Libratus (from
CMU) and DeepStack (also from Alber-
By David Silver
To view the accompanying paper,
The methods used
to solve poker
are quite general,