artificial-intelligence tic-tac-toe minimax alpha-beta-pruning

Can a cache be used for an alpha-beta search algorithm?

I'm working on a minimax tic-tac-toe algorithm. I got it working fine, caching each state in the tree.

Then I implemented alpha-beta pruning, which seemed to affect the game. I think the problem is that nodes cannot be "trusted" if any of their descendants (children, grandchildren, etc.) were pruned. Is this true?

For now, I'm only caching states if they don't have pruned descendants. This image shows my point (not tic tac toe). The max player is the upwards triangle, which should choose the move on the left. However, if the move on the right is cached during alpha-beta pruning, the red triangle will have a false value of 4, so the move on the right would be wrongly chosen.

Solution

If by a "cache" you mean a transposition table, then you can't always trust the value in the transposition table. That is, when you store a value in a transposition table, you need to also store the alpha and beta values (perhaps the depth as well) used for the search below that state. If the alpha and beta values are not the same*, then you can't use the value from the transposition table.

*In practice they don't have to be identical, the table just needs to have values that include a superset of the values used at the current node you want to replace with the cached values.

Edit: Additional info for those dealing with this in larger games. When you search at a node you have a lower bound (alpha) and upper bound (beta) on the final value. If the returned value is between alpha and beta, then you know it is the true value of the state. If it is equal to alpha or beta, then you know it is only a bound on the final value. But, you can still use this information to help the search.

In particular, suppose that you have alpha=10 and beta=20 in the current search and the value in the transposition table is [alpha = 12, beta = 30, value = 12]. Then, when you (re-)search below the branch, you can search with bounds of alpha=10 and beta=12.

This is because you've already proven that the value is <= 12 in the previous search. When you get the final result, you can then update the transposition table entry to reflect the additional information from this search.