alphacats

package module
v0.0.0-...-014f423 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 6, 2020 License: GPL-3.0 Imports: 10 Imported by: 0

README

AlphaCats

AlphaCats was a failed attempt to solve the game of Exploding Kittens using Deep Counterfactual Regret Minimization. AlphaCats is built around the go-cfr package.

Due to the depth of the game tree, external sampling is intractable, and other forms of MC-CFR sampling (such as outcome sampling), led to high-variance samples and a model that struggled to converge.

Future areas of investigation could include variance-reduction and improved sampling techniques.

No Maintenance Intended GoDoc

Usage

cmd/alphacats is the main driver binary. CFR iteration can be launched with:

./cmd/alphacats/alphacats -logtostderr \
    -decktype core -cfrtype deep -iter 10 \
    -sampling.num_sampling_threads 5000 \
    -sampling.max_num_actions 2 \
    -sampling.exploration_eps 1.0 \
    -deepcfr.traversals_per_iter 10000 \
    -deepcfr.buffer.size 10000000 \
    -deepcfr.model.num_encoding_workers 4 \
    -deepcfr.model.batch_size 10000 \
    -deepcfr.model.max_inference_batch_size 10000 \
    -output_dir output -v 1 2>&1 | tee run.log

This will run DeepCFR with a reservoir buffer of size 10 million, and sample the game tree using robust sampling with K=2.

Tabular CFR can also be launched with -cfrtype tabular. It requires a large amount of memory and therefore a smaller test game can be selected with -decktype test. Tabular CFR is not thread-safe and must be run with -sampling.num_sampling_threads 1.

Model

The underlying model used in AlphaCats is an LSTM over the game history that feeds forward into a deep fully connected network.

# The history (LSTM) arm of the model.
history_input = Input(name="history", shape=history_shape)
lstm = Bidirectional(CuDNNLSTM(32, return_sequences=False))(history_input)

# The private hand arm of the model.
hands_input = Input(name="hands", shape=hands_shape)

# Concatenate and predict advantages.
merged_inputs = concatenate([lstm, hands_input])
merged_hidden_1 = Dense(128, activation='relu')(merged_inputs)
merged_hidden_2 = Dense(128, activation='relu')(merged_hidden_1)
merged_hidden_3 = Dense(128, activation='relu')(merged_hidden_2)
merged_hidden_4 = Dense(64, activation='relu')(merged_hidden_3)
merged_hidden_5 = Dense(64, activation='relu')(merged_hidden_4)
normalization = BatchNormalization()(merged_hidden_5)
advantages_output = Dense(N_OUTPUTS, activation='linear', name='output')(normalization)

model = Model(
    inputs=[history_input, hands_input],
    outputs=[advantages_output])
model.compile(
    loss='mean_squared_error',
    optimizer=Adam(clipnorm=1.0),
    metrics=['mean_absolute_error'])

See model/train.py for the training script. During training, samples are first generated using a go-cfr sampler, saved to *.npz files, and then loaded by the script in minibatches. The resulting model is saved in TensorFlow format, and loaded for inference (see model/lstm.go).

Documentation

Index

Constants

View Source
const (
	PlayTurn turnType
	GiveCard
	ShuffleDrawPile
	MustDefuse
	InsertKittenRandom
	GameOver
)

Variables

This section is empty.

Functions

func CountDistinctShuffles

func CountDistinctShuffles(deck cards.Set) int

func EnumerateDealsWithP0Hand

func EnumerateDealsWithP0Hand(deck, p0Deal cards.Set, cb func(d Deal))

func EnumerateDealsWithP1Hand

func EnumerateDealsWithP1Hand(deck, p1Hand cards.Set, cb func(d Deal))

func EnumerateInitialDeals

func EnumerateInitialDeals(deck cards.Set, cardsPerPlayer int, cb func(d Deal))

func EnumerateShuffles

func EnumerateShuffles(deck cards.Set, cb func(shuffle cards.Stack))

Types

type AbstractedInfoSet

type AbstractedInfoSet struct {
	Player           gamestate.Player
	PublicHistory    gamestate.History
	Hand             cards.Set
	P0PlayedCards    cards.Set
	P1PlayedCards    cards.Set
	DrawPile         cards.Stack
	AvailableActions []gamestate.Action
}

AbstractedInfoSet abstracts away private history. The main difference in this abstraction is that the exact ordering in which private cards were received in the history is neglected. A second difference is that cards known to be in the draw pile (but not known where) are forgotten. This can happen if a SeeTheFuture card is played followed by a shuffle.

func (*AbstractedInfoSet) Key

func (is *AbstractedInfoSet) Key() []byte

Key implements cfr.InfoSet.

func (*AbstractedInfoSet) MarshalBinary

func (is *AbstractedInfoSet) MarshalBinary() ([]byte, error)

func (AbstractedInfoSet) String

func (a AbstractedInfoSet) String() string

func (*AbstractedInfoSet) UnmarshalBinary

func (is *AbstractedInfoSet) UnmarshalBinary(buf []byte) error

type BeliefState

type BeliefState struct {
	// contains filtered or unexported fields
}

BeliefState holds the distribution of probabilities over underlying game states as perceived from the point of view of one player.

func NewBeliefState

func NewBeliefState(opponentPolicy func(cfr.GameTreeNode) []float32, infoSet gamestate.InfoSet) *BeliefState

Return all game states consistent with the given initial hand. Note that the passed hand should include the Defuse card.

func (*BeliefState) Clone

func (bs *BeliefState) Clone() *BeliefState

func (*BeliefState) Len

func (bs *BeliefState) Len() int

func (*BeliefState) Less

func (bs *BeliefState) Less(i, j int) bool

func (*BeliefState) SampleDeterminization

func (bs *BeliefState) SampleDeterminization() *GameNode

func (*BeliefState) Swap

func (bs *BeliefState) Swap(i, j int)

func (*BeliefState) Update

func (bs *BeliefState) Update(infoSet gamestate.InfoSet)

Update belief state by propagating all current states forward, expanding determinizations as necessary and filtering to those that match the given new info set.

type Deal

type Deal struct {
	DrawPile cards.Stack
	P0Deal   cards.Set
	P1Deal   cards.Set
}

func NewRandomDeal

func NewRandomDeal(deck []cards.Card, cardsPerPlayer int) Deal

func NewRandomDealWithConstraints

func NewRandomDealWithConstraints(drawPile cards.Stack, p1Hand cards.Set) Deal

type GameNode

type GameNode struct {
	// contains filtered or unexported fields
}

GameNode implements cfr.GameTreeNode for Exploding Kittens. GameNode represents a state of play in the extensive-form game tree.

func NewGame

func NewGame(drawPile cards.Stack, p0Deal, p1Deal cards.Set) *GameNode

NewGame creates a root node for a new game with the given draw pile and hands dealt to each player.

func (*GameNode) Clone

func (gn *GameNode) Clone() *GameNode

func (*GameNode) CloneWithState

func (gn *GameNode) CloneWithState(state gamestate.GameState) *GameNode

func (*GameNode) Close

func (gn *GameNode) Close()

Close implements cfr.GameTreeNode.

func (*GameNode) Depth

func (gn *GameNode) Depth() int

func (*GameNode) GetChild

func (gn *GameNode) GetChild(i int) cfr.GameTreeNode

GetChild implements cfr.GameTreeNode.

func (*GameNode) GetChildProbability

func (gn *GameNode) GetChildProbability(i int) float64

GetChildProbability implements cfr.GameTreeNode.

func (*GameNode) GetDrawPile

func (gn *GameNode) GetDrawPile() cards.Stack

func (*GameNode) GetHistory

func (gn *GameNode) GetHistory() gamestate.History

func (*GameNode) GetInfoSet

func (gn *GameNode) GetInfoSet(player gamestate.Player) gamestate.InfoSet

func (*GameNode) GetState

func (gn *GameNode) GetState() gamestate.GameState

func (*GameNode) InfoSet

func (gn *GameNode) InfoSet(player int) cfr.InfoSet

InfoSet implements cfr.GameTreeNode.

func (*GameNode) InfoSetKey

func (gn *GameNode) InfoSetKey(player int) []byte

func (*GameNode) LastAction

func (gn *GameNode) LastAction() gamestate.Action

func (*GameNode) NumChildren

func (gn *GameNode) NumChildren() int

func (*GameNode) Parent

func (gn *GameNode) Parent() cfr.GameTreeNode

func (*GameNode) Player

func (gn *GameNode) Player() int

Player implements cfr.GameTreeNode.

func (*GameNode) SampleChild

func (gn *GameNode) SampleChild() (cfr.GameTreeNode, float64)

SampleChild implements cfr.GameTreeNode.

func (*GameNode) String

func (gn *GameNode) String() string

String implements fmt.Stringer.

func (*GameNode) Type

func (gn *GameNode) Type() cfr.NodeType

Type implements cfr.GameTreeNode.

func (*GameNode) Utility

func (gn *GameNode) Utility(player int) float64

Utility implements cfr.GameTreeNode.

type InfoSetWithAvailableActions

type InfoSetWithAvailableActions struct {
	gamestate.InfoSet
	AvailableActions []gamestate.Action
}

func (*InfoSetWithAvailableActions) MarshalBinary

func (is *InfoSetWithAvailableActions) MarshalBinary() ([]byte, error)

func (*InfoSetWithAvailableActions) UnmarshalBinary

func (is *InfoSetWithAvailableActions) UnmarshalBinary(buf []byte) error

Directories

Path Synopsis
cmd
alphacats
This version of alphacats uses one-sided IS-MCTS with a NN to guide search, in a PSRO framework.
This version of alphacats uses one-sided IS-MCTS with a NN to guide search, in a PSRO framework.
alphacats_bootstrap
Generate training samples for PSRO network bootstrap by playing games with Smooth UCT search.
Generate training samples for PSRO network bootstrap by playing games with Smooth UCT search.
alphacats_mcts
This version of alphacats uses Smooth UCT MCTS only.
This version of alphacats uses Smooth UCT MCTS only.
count_game_nodes
Script to estimate the number of nodes touched in an external sampling run.
Script to estimate the number of nodes touched in an external sampling run.
Package model implements an LSTM-based network model for use in MCTS.
Package model implements an LSTM-based network model for use in MCTS.
internal/npyio
Package npyio is a fork of github.com/sbinet/npyio that is hard-coded for []float32s to avoid reflection.
Package npyio is a fork of github.com/sbinet/npyio that is hard-coded for []float32s to avoid reflection.
internal/tffloats
Package tffloats constructs *tf.Tensors from []float32 slices, avoiding reflection.
Package tffloats constructs *tf.Tensors from []float32 slices, avoiding reflection.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL