algorithm graph-algorithm alpha-beta-pruning game-theory

Alpha beta pruning pseudo code understanding

I am having trouble understanding the following pseudo code from my textbook. In particular, I don't understand the first function. After we compute v ← MAX-VALUE(state, −∞, +∞), how do we proceed to compute the action with value v? What does the action with value v even mean? Does it mean to compute MAX-VALUE(Result(state, a) , −∞, +∞) for each action, and return the action which is the same as v ← MAX-VALUE(state, −∞, +∞). Normally, I have no trouble implementing pseudo code from this book, but this time I am severely stuck. Could someone help explain this to me?

function ALPHA-BETA-SEARCH(state) returns an action
 v ← MAX-VALUE(state, −∞, +∞)
 return the action in ACTIONS(state) with value v

function MAX-VALUE(state, α, β) returns a utility value
 if TERMINAL-TEST(state) the return UTILITY(state)
 v ← −∞
 for each a in ACTIONS(state) do
   v ← MAX(v, MIN-VALUE(RESULT(state, a), α, β))
   if v ≥ β then return v
   α ← MAX(α, v)
 return v

function MIN-VALUE(state, α, β) returns a utility value
 if TERMINAL-TEST(state) the return UTILITY(state)
 v ← +∞
 for each a in ACTIONS(state) do
   v ← MIN(v, MAX-VALUE(RESULT(state, a), α, β))
   if v ≤ α then return v
   β ← MIN(β, v)
 return v

Solution

I believe that if you read this explanation, you will understand the meaning of v and action and be able to implement this pseudo code.

Let's break this down.

state - a problem instance alpha - the maximum lower bound of possible solutions beta - the minimum upper bound of possible solution

When a new state is considered as a possible path to the solution, it must have:

alpha <= price(state) <= beta

now let us understand the methods:

1. MAX-VALUE(state, α, β)

Returns the maximum possible value of the sub-tree, rooted in state, with a and b as the alpha and beta (respectively).

How does it do that? Well, for each state that is reachable in one-step from state, we want to consider the value (or "price") of the action that moves us to that state. The intuition behind this, is that if you keep this up, you'll know the price to a final-state, do this for all (relevant only) final-states, and you know the optimal "mini-max" solution.

In your pseudo code, it is done in the following line:

for each a in ACTIONS(state) do

This action is the action, that if applied on the current state, return the next state. You can imagine this as move player left for example, thus the returning state is the original state, only where the player is now one step to the left (silly example, but it might help). Note that it is referred in your code as RESULT(state, a) - I'll use next_state for simplicity.

Now let us examine the following line:

v ← MAX(v, MIN-VALUE(RESULT(state, a), α, β))

First, let's look at MIN-VALUE(RESULT(state, a), α, β), or better yet: MIN-VALUE(next_state, α, β). Because you are now "exploring" a new path (the one that gets to next_state, you want to take the minimum of all the possibilities from that states (just for the next level). That is because it is the opponent's turn next, and we are assuming that he plays as smart as possible (so we know we have the optimal path). Intuitively, this can be seen as "If I choose to go with next_state, what is the worst-case scenario's value?".

OK, so now, why do we have this MAX(v, ...)? This one is pretty simple, if you already found a better value v, through a different path, just go with it, no reason to go with next_state.

2. MIN-VALUE(state, α, β)

This is very similar to MAX-VALUE. You can look at this as the "opponent's turn", and because he is "smart", he chooses his next move as the one that gives you the minimum value (thus MIN-VALUE).

3. Finding the solution:

function ALPHA-BETA-SEARCH(state) returns an action
 v ← MAX-VALUE(state, −∞, +∞)

Now, to utilizes MAX-VALUE and MIN-VALUE, you just need to find the optimal solution, starting from the root state, with no specific alpha or beta limitations (those would be updated in MAX-VALUE and MIN-VALUE), so you assign them the −∞ and +∞ values.

So what do you need to implement this

ACTIONS(state)

Each state needs to have a set or operators, each returning a new state, which is the original state after applying the logic of the operator.

RESULT(state, action)

Follow the above explanation, this can be simply seen as state.action() or action(state). Just applying the action on the state and returning the new state.

TERMINAL-TEST(state)

A predicate answering the question "Is this a terminal state?". You need to be able to end your current branch search somewhere - this predicate is the break-condition for each branch.

I hope this helped clear things out little bit.