im trying to implement a tic-tac-toe game using minimax algortihm. I have gotten the program running but it dosen't chose an optimal move and im really lost why it dosen't, it is probably a problem in minimax function but i maybe completley wrong. So for example if i put x anywhere it always begings at 3,3
1 2 3
1
2 X
3 O
It also don't try to block the 3 in a row so if i put another x in 1,1 I get this result:
1 2 3
1 X
2 X
3 O O
It feels like it activley have switched to doing the worst move instead, I end up with a reult looking like this:
1 2 3
1 X X O
2 O X
3 X O O
This is consistent with different states of the board.
Cred to Luke Miles on github for pieces of the code!
from abc import ABC, abstractmethod
import numpy as np
import math
#create the nodes for a board state
class Node(ABC):
@abstractmethod
def find_children(self):
"All possible successors of this board state"
return set()
@abstractmethod
def is_terminal(self):
"Returns True if the node has no children"
return True
@abstractmethod
def reward(self):
"Assumes `self` is terminal node. 1=win, 0=loss, .5=tie, etc"
return 0
class MinimaxTreeSearch:
def __init__(self, depth=5):
self.depth = depth # Define search depth limit
def choose(self, node):
if node.is_terminal() or self.depth==0:
raise RuntimeError(f"choose called on terminal node {node}")
best_move,_ = self.minimax(node, self.depth,maxi=True)
return best_move
def minimax(self, node, depth,maxi):
if depth == 0 or node.is_terminal():
return None, self.evaluate(node)
moves = list(node.find_children())
if maxi:
best = -math.inf
best_move = None
for move in moves:
_, value = self.minimax(node=move, depth=depth-1,maxi=False)
if value > best:
best = value
best_move = move
return best_move, best
else:
best = math.inf
best_move = None
for move in moves:
_, value = self.minimax(node=move, depth=depth-1,maxi=True)
if value < best:
best = value
best_move = move
return best_move, best
def evaluate(self, node):
if node.is_terminal():
return node.reward() # Use game-specific reward
return 0 # Neutral score for intermediate states
class ticboard(Node):
def __init__(board, board_state=[None,] * 9, winner=None,turn=True, terminal=False):
board.board_state=board_state
board.turn = turn
board.winner = winner
board.terminal = terminal
def find_children(board):
if board.terminal:
return set()
return{board.make_move(i) for i, value in enumerate(board.board_state) if value is None}
def reward(board):
if not board.terminal:
raise RuntimeError(f"reward on no terminal board {board}")
if board.winner is None:
return 0.5
if board.winner == board.turn:
return 1
return 0
def is_terminal(board):
return board.terminal
def make_move(board, index):
board_state= board.board_state.copy()
board_state[index] = board.turn
turn = not board.turn
winner = find_winner(board_state)
is_terminal = (winner is not None) or not any(spot is None for spot in board_state)
return ticboard(board_state,winner,turn,is_terminal)
def to_pretty_string(board):
to_char = lambda v: ("X" if v is True else ("O" if v is False else " "))
rows = [
[to_char(board.board_state[3 * row + col]) for col in range(3)] for row in range(3)
]
return (
"\n 1 2 3\n"
+ "\n".join(str(i + 1) + " " + " ".join(row) for i, row in enumerate(rows))
+ "\n"
)
def play_game():
tree = MinimaxTreeSearch(depth=5)
board = ticboard()
print(board.to_pretty_string())
while True:
row_col = input("enter row,col: ")
row, col = map(int, row_col.split(","))
index = 3 * (row - 1) + (col - 1)
if board.board_state[index] is not None:
raise RuntimeError("Invalid move")
board = board.make_move(index)
print(board.to_pretty_string())
if board.terminal:
break
board = tree.choose(board)
print(board.to_pretty_string())
if board.terminal:
break
def win_combos():
combos=[]
for start in range(0, 9, 3): # three in a row
combos.append([start, start + 1, start + 2])
for start in range(3): # three in a column
combos.append([start, start + 3, start + 6])
combos.append([0, 4, 8])
combos.append([2, 4, 6])
return(combos)
def find_winner(board_state):
win_combo= win_combos()
for combo in win_combo:
for player in [True,False]:
a, b ,c = combo
if board_state[a] == board_state[b] == board_state[c] and board_state[a] is player:
return player
return None
if __name__ == "__main__":
play_game()
i have tried using node.turn instead of maxi(maximizingplayer) but i have a hard time grasping on how I should use it. Thanks for any help and have a wonderful day!
There are these issues:
In reward
you return 1 (maximum score) when the winner equals the player whose turn it is. But as the winning move is always the last played move, and the board that was generated will have the turn
set to the other player, this condition is never true, and so the reward
function never returns 1! What you need here is to check if the winner is the maximizing player. As your game loop has the human player play with X (turn
is True
), the choose
method is called when O is to play (turn
is False
). You pass maxi=True
in that method, which means you consider O to be the maximizing player. Consequently, reward
should return 1 when winner
is O, i.e. when turn
is False
. So correct that last if
statement in reward
to:
if board.winner == False:
return 1
evaluate
returns 0 when the state is not terminal. A comment clarifies that this is a neutral score. But that is a mistake. Since your reward
function returns 0 when a certain player (see above) wins, this is certainly not a neutral score. To be consistent with what reward
uses as its score scale (0, 0.5, 1), the neutral score is 0.5. So correct the final statement in evaluate
to:
return 0.5
In order to have strong play, you need to provide a depth of at least 6. With 5 you can still get a bad move. Be aware the depth counts plies, i.e. where also the opponent moves count as 1, so with 5 you will only consider a second opponent move. So change the first statement in play_game
to:
tree = MinimaxTreeSearch(depth=6)
This will fix the problems you encountered.
There is much more to say about this code. For instance:
win_combos
will always return the same result, yet you call it repeatedly.
find_winner
doesn't need a loop over [False,True]
. Just return the content of one of the three board cells that is part of a three-in-a-row.
There is no need for the maxi
parameter as you can just read the value of turn
to decide on that. This will also make it possible to use the choose
method for the X player.
The first parameter in a method definition should by common standards be named self
, not board
numpy
is imported but never used
not any(spot is None for spot in board_state)
is an inefficient way to do None not in board_state
If you would start the game with a partially filled board, which might be a loss for the player that uses minimax, your algorithm will not prefer to play moves that delay a loss. You can get this to work by reducing the "severeness" of a score (i.e. it gets closer to the neutral score) the more you get out of recursion. This way you will have more extreme scores for quick wins/losses.
A lot can be improved with the way you store the board. Instead of a list you could use bitboard techniques so that moves are nothing more then setting a bit in an integer, and win-detection can achieved by performing a few binary AND operations...etc.