python recursion artificial-intelligence tic-tac-toe minimax

Minimax Tic-Tac-Toe Python Logical Issue

Having trouble with the logic of my minimax function. All other helper functions are tested and performing properly. When the AI of my tic-tac-toe game reaches the minimax function, the error I get is "TypeError: '\<' not supported between instances of 'tuple' and 'float'", so I'm guessing it's an issue with comparing v and the new_action tuple that is returned? So far unable to diagnose the issue in the logic of minimax. Min player O has a utility of -1, 0 for a tie, and max player X has a utility of 1. ("state" is the 2D array of the game board)

def minimax(state):
    def max_value(state):
        if terminal(state): return utility(state)
        v = float('-inf')
        new_action = None
        for action in actions(state):
            temp = max(v, min_value(result(state, action)))
            if temp > v:
                v = temp
                new_action = action
        return new_action
        
    def min_value(state):
        if terminal(state): return utility(state)
        v = float('inf')
        new_action = None
        for action in actions(state):
            temp = min(v, max_value(result(state, action)))
            if temp < v:
                v = temp
                new_action = action
        return new_action

    if terminal(state): return None

    if player(state) == X: max_value(state)
    else: min_value(state)

I've looked into the logic of minimax functions and pseudocode but haven't been able to make progress.

Solution

There are a few issues:

The final if..else statement in minimax doesn't return anything, so minimax always returns None
There is a mix up of return value for min_value and max_value. On the one hand, the expression max(v, min_value(result(state, action))) expects min_value to return a value (score), while on the other hand, return new_action returns the action (move) -- which the ultimate caller of minimax would also like to get.

The solution is to have min_value and max_value return a pair so the caller has both the value (score) and the action (move). For that reason I would also rename these functions to minimize and maximize (as they don't only return the value)

Note that there is no need to call max or min where you have it. Also, there is no need to check for a terminal state in the main body of the minimax function, as this is the first check that the inner functions make anyhow.

 def minimax(state):

    def maximize(state):
        if terminal(state):
            return (utility(state), None)
        best = (float('-inf'), None)  # Pair of value and action
        for action in actions(state):
            temp = minimize(result(state, action))  # We get a pair
            if temp > best:
                best = (temp, action)
        return best  # Return the pair of value and action
        
    def minimize(state):
        if terminal(state):
            return (utility(state), None)
        best = (float('inf'), None)  # Pair of value and action
        for action in actions(state):
            temp = maximize(result(state, action))  # We get a pair
            if temp < best:  # Compare pairs by value first
                best = (temp, action)
        return best  # Return the pair of value and action

    # Choose which function to call
    optimize = (minimize, maximize)[player(state) == X]
    return optimize(state)[1]  # Extract the move from the pair and return it

If action objects are not comparable, then compare the first pairmembers only, like if temp[0] < best[0].

The statement to set optimize is short for:

if player(state) == X:
    optimize = maximize
else:
    optimize = minimize

As functions are objects like other types are, you can just assign them and then call them (as optimize references a function).

Without that extra name optimize you'd do:

if player(state) == X:
    return maximize(state)[1]
else:
    return minimize(state)[1]