algorithm artificial-intelligence minimax alpha-beta-pruning

Is the tree data structure needed in the alpha-beta pruning algorithm?

The alpha-beta pruning algorithm is as follows:

function ALPHA-BETA-SEARCH(state) returns an action
      v <- MAX-VALUE(state, -∞, +∞)
      return the action in ACTIONS(state) with value v

function MAX-VALUE(state, α, β) returns a utility value
    if TERMINAL-TEST(state) then 
        return UTILITY(state)
    v <- -∞
    for each a in ACTIONS(state) do
        v <- MAX(v, MIN-VALUE(RESULT(state, a), α, β))
        if v ≥ β then
            return v
        α <- MAX(α, v)
    return v

function MIN-VALUE(state, α, β) returns a utility value
    if TERMINAL-TEST(state) then 
        return UTILITY(state)
    v <- +∞
    for each a in ACTIONS(state) do
        v <- MIN(v, MAX-VALUE(RESULT(state, a), α, β))
        if v ≤ α then 
            return v
        β <- MIN(β, v)
    return v

The algorithm states that the ACTIONS function will give a list of all the available actions in a given state.

Let's e.g. take the game of checkers. Suppose that one checker, say A, is in diagonal with another checker, say B. If A can take B, then that is an (unavoidable, since we must take the other checker, if we can) action. Or, if there are multiple takes, these are all actions. This situation can actually be drawn using pencil and paper. More specifically, the situation could be represented using a tree, where each node represents a state and the edges to its child nodes represent the possible actions from that state.

I think you don't need to explicitly store a tree data structure. However, the algorithm above contains the following statement: return the action in ACTIONS(state) with value v. Now, the ACTIONS(state) will return the possible actions from a certain state (e.g. where A needs to play).

If we work out all the algorithm, we will get a value v, and we will follow the node with the value v that is passed from the terminal node.

Now, suppose I do not store the full tree of all possible moves or actions from every state. When return the action ACTIONS(state) with the value v is executed, I will only get the actions that lead me to the next state, and I know that one of the actions leads me to the best possible path. But, if I don't have extra bookkeeping, e.g. the full tree, will I be able to match the actions with the value v?

Solution

You shouldn't build an explicit tree structure in a minimax algorithm, and in practical situations, you can't. A minimax algorithm with a depth bound d and a branching factor b traverse a tree that is O(dᵇ) nodes large, which very soon gets too large to store. (In the version you posted, there isn't even a depth bound, meaning that you would generate the entire game tree.)

The way to keep track of the state is to rewrite the top-level ALPHA-BETA-SEARCH as

function ALPHA-BETA-SEARCH(state) returns an action
    best_a <- nil
    max_v <- -∞
    for each a in actions(state)
        v <- MIN-VALUE(RESULT(state, a), +∞, max_v)
        if v > max_v then
            max_v <- v
            best_a <- a
    return best_a

That is, you "unroll" the top call to MAX-VALUE in the main function.

Note that, in the function above, you're looping over each action a, computing their v. When a certain v is the maximum you've found so far, you know that the action you computed it from is currently the best_a.