Efficient way to implement Minimax AI for Tic Tac Toe orders 3, 4, 5?

I want to create an artificial intelligence that plays Tic Tac Toe better than any human.

I have already written a completely working GUI Tic Tac Toe game with AI players, but all those AI players are bad. So I have created another AI using reinforcement learning, it works and is guaranteed to not to lose more than 99% of the time, but it doesn't win as often.

I want to do better, so I want to use Minimax algorithm to implement the AI. I have found an example online, but the code quality is very poor. I am able to reimplement it more efficiently, but I don't know the most efficient way to implement it.

I want the AI for 3x3, 4x4 and 5x5 Tic Tac Toe games. These games end when there is a line (row, column and diagonal) that is completely filled with the same player's pieces, or the whole board is filled.

There are 3 possible states each cell can be in, on order n board there are n² cells, so for order n game of Tic Tac Toe there are a total of 3^n² possible states, regardless of validity, discounting reflections and rotations. For order 3 there are 19,683 possibilities which is quite small, for order 4 there are 43,046,721 possibilities, that is more than 43 million, a big number, but bruteforceable. For order 5 that is 847,288,609,443, more than 847 billion, it is a huge number and not bruteforceable.

I have checked all possible order 3 Tic Tac Toe states, and systematically enumerated all possible order 3 Tic Tac Toe states if a given player moves first. There are 5478 states reachable if one player moves first, and a total of 8533 states reachable via normal gameplay.

I want to know, what are more efficient ways to check whether a given state is legal, whether a state is a game over state, and whether a state can let one player win in one move.

A state is legal if it can be reached via normal gameplay, a state is a game over state if there are no empty spots in some line and all pieces on the line are the same, and a state can let one player win in one move if there is exactly one empty spot and all other pieces are the same. When the game is one move away from ending I want the AI only consider such gaps.

A state can be reached via normal gameplay if the absolute difference between the counts of pieces is less than or equal to 1, because players take turns alternatively, one player cannot move in two consecutive turns. It must also satisfy that there are no two winners, because the game ends when there is a winner. And It must also satisfy that the loser's moves are no greater than the winner's, because that would mean the loser made a move after there is a winner, which is illegal.

I have solved the aforementioned problems for 3x3 Tic Tac Toe using loops and sets, implementing the logic I mentioned, but I don't think they are very efficient.

(A board is represented by a flat iterable, empty spots are represented as " ", and players "O", "X", the cell with index i corresponds to *divmod(i, 3) on the square board)

LINES = (
    (0, 3, 1),
    (3, 6, 1),
    (6, 9, 1),
    (0, 7, 3),
    (1, 8, 3),
    (2, 9, 3),
    (0, 9, 4),
    (2, 7, 2),
)


def is_valid(board: str) -> bool:
    winners = set()
    winner = None
    for start, stop, step in LINES:
        line = board[start:stop:step]
        if len(set(line)) == 1 and (winner := line[0]) in {"O", "X"}:
            winners.add(winner)

    return (
        len(winners) <= 1
        and abs(board.count("O") - board.count("X")) <= 1
        and (not winner or board.count(winner) >= board.count("OX".replace(winner, "")))
    )


def is_playable(board: str) -> bool:
    for start, stop, step in LINES:
        line = board[start:stop:step]
        if len(set(line)) == 1 and line[0] in {"O", "X"}:
            return False

    return " " in board and abs(board.count("O") - board.count("X")) <= 1


def check_state(board: str) -> Tuple[bool, str]:
    for start, stop, step in LINES:
        line = board[start:stop:step]
        if len(set(line)) == 1 and (winner := line[0]) in {"O", "X"}:
            return True, winner

    return " " not in board, None


def find_gaps(board: str, piece: str) -> int:
    gaps = []
    for start, end, step in LINES:
        line = board[start:end:step]
        if line.count(piece) == 2 and " " in line:
            gaps.append(start + line.index(" ") * step)

    return gaps

What are more efficient ways to achieve the above tasks?

Update

I did some tests:

In [14]: board = "OXOX     "

In [15]: %timeit board[0:3:1]
109 ns ± 0.806 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [16]: %timeit (board[0], board[1], board[2])
124 ns ± 1.09 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [17]: FILLED = {('X', 'X', 'X'), ('O', 'O', 'O')}

In [18]: %timeit (board[0], board[1], board[2]) in FILLED
162 ns ± 1.39 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

In [19]: %timeit len(set(board[0:3:1])) == 1
348 ns ± 10.6 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [20]: %timeit tuple(board[0:3:1]) in FILLED
301 ns ± 8.03 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [21]: new_board = [1 if s == 'O' else (-1 if s == 'X' else 0) for s in board]

In [22]: new_board
Out[22]: [1, -1, 1, -1, 0, 0, 0, 0, 0]

In [23]: %timeit sum(new_board[0:3:1])
269 ns ± 9.68 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [24]: %timeit tuple(board[i] for i in (0, 1, 2))
699 ns ± 13 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

In [25]: %timeit a, b, c = (0, 1, 2); (board[a], board[b], board[c])
146 ns ± 1.69 ns per loop (mean ± std. dev. of 7 runs, 10,000,000 loops each)

The most efficient way I can find is to store the indices of cells of rows, columns and diagonals, loop through the collection of indices tuples, and unpack the tuple to use each element to retrieve the cell of the board and build a tuple for that line.

Then whether a line is full can be checked by checking if the tuple is ('O',)*n or ('X',)*n. Whether there is a gap can be checked by membership checking, there are only n possibilities for a given player where there is a single line that is exactly filled with n-1 pieces of the player and the other spot is empty. So two collections, one for each player, can be used to find such gaps. Using a dictionary results in faster membership checking.

This is the best I can do but I don't know if it is super efficient.

Solution

A simple way to check for a winning position for small boards could be using bit tables. In 3x3 you have 9 squares. Consider the X marks as ones and empty or O marks as zeros... any board can be seen as a number 0...511 (511=2⁹-1). This means that with an array of 512 booleans checking for if the board contains 3 X marks in a row is just a lookup on wtab[index].

Thus you could represent the board using two numbers: the positions where X marks are and the positions where O marks are. Even better as a tuple with positions where who is next to play has marks and where the opponent has marks.

For example the function that accepts a position and a valid move and returns the updated position becomes simply:

def play(pos, move):
    # pos[0] is player, pos[1] is opponent
    # we need to add the mark for the player and swap roles
    return (pos[1], pos[0] | (1 << move))

and checking if who played just won the game is

if wtab[pos[1]]: ...

For 4x4 board there are 16 squares (65536 elements, still a reasonable solution... especially if compressing to 1-bit per entry = 8Kb) and for 5x5 there are 25 squares (32M elements, that can be trivially packed to 4Mb... but starts making not much sense IMO, at that size just checking neighbors is probably faster).

Note also that caring about speed and using Python for this kind of problem sounds funny. With Python and this type of programs you're already paying a 100x slowdown compared to C/C++/C#/Java. Even writing your code with Javascript would give you a 10 times faster program with a similar level of abstraction in terms of language. If using Python at least go for PyPy that for speed is similar to Javascript in my experience.