Search code examples
pythonrecursionartificial-intelligencetic-tac-toeminimax

Minimax Tic-Tac-Toe Python Logical Issue


Having trouble with the logic of my minimax function. All other helper functions are tested and performing properly. When the AI of my tic-tac-toe game reaches the minimax function, the error I get is "TypeError: '\<' not supported between instances of 'tuple' and 'float'", so I'm guessing it's an issue with comparing v and the new_action tuple that is returned? So far unable to diagnose the issue in the logic of minimax. Min player O has a utility of -1, 0 for a tie, and max player X has a utility of 1. ("state" is the 2D array of the game board)

def minimax(state):
    def max_value(state):
        if terminal(state): return utility(state)
        v = float('-inf')
        new_action = None
        for action in actions(state):
            temp = max(v, min_value(result(state, action)))
            if temp > v:
                v = temp
                new_action = action
        return new_action
        
    def min_value(state):
        if terminal(state): return utility(state)
        v = float('inf')
        new_action = None
        for action in actions(state):
            temp = min(v, max_value(result(state, action)))
            if temp < v:
                v = temp
                new_action = action
        return new_action

    if terminal(state): return None

    if player(state) == X: max_value(state)
    else: min_value(state)

I've looked into the logic of minimax functions and pseudocode but haven't been able to make progress.


Solution

  • There are a few issues:

    • The final if..else statement in minimax doesn't return anything, so minimax always returns None

    • There is a mix up of return value for min_value and max_value. On the one hand, the expression max(v, min_value(result(state, action))) expects min_value to return a value (score), while on the other hand, return new_action returns the action (move) -- which the ultimate caller of minimax would also like to get.

    The solution is to have min_value and max_value return a pair so the caller has both the value (score) and the action (move). For that reason I would also rename these functions to minimize and maximize (as they don't only return the value)

    Note that there is no need to call max or min where you have it. Also, there is no need to check for a terminal state in the main body of the minimax function, as this is the first check that the inner functions make anyhow.

     def minimax(state):
    
        def maximize(state):
            if terminal(state):
                return (utility(state), None)
            best = (float('-inf'), None)  # Pair of value and action
            for action in actions(state):
                temp = minimize(result(state, action))  # We get a pair
                if temp > best:
                    best = (temp, action)
            return best  # Return the pair of value and action
            
        def minimize(state):
            if terminal(state):
                return (utility(state), None)
            best = (float('inf'), None)  # Pair of value and action
            for action in actions(state):
                temp = maximize(result(state, action))  # We get a pair
                if temp < best:  # Compare pairs by value first
                    best = (temp, action)
            return best  # Return the pair of value and action
    
        # Choose which function to call
        optimize = (minimize, maximize)[player(state) == X]
        return optimize(state)[1]  # Extract the move from the pair and return it
    

    If action objects are not comparable, then compare the first pairmembers only, like if temp[0] < best[0].

    The statement to set optimize is short for:

    if player(state) == X:
        optimize = maximize
    else:
        optimize = minimize
    

    As functions are objects like other types are, you can just assign them and then call them (as optimize references a function).

    Without that extra name optimize you'd do:

    if player(state) == X:
        return maximize(state)[1]
    else:
        return minimize(state)[1]