python algorithm graph artificial-intelligence minimax

Finding a suitable move from minimax using the values returned from evaluation function

So, I am trying to implement minimax algorithm for a simple game that has 2 players each having 2 queens. So total 4 queens on a 7X7 board. So on each turn, the players moves both of his queens to a new position.

I try to find the min and max by recursing over, minimax function as shown below. The base case is supposed to return an integer, which is a score returned by the evaluation function. But how do I find the min and the max after I have traversed till the leaf node?

This function should be able to return the best move for queen1 and the queen2. But I do not understand, as to how to proceed to find the min and max from leaf node value. How do I propagte the values. I just cannot understand/visualize this.

Solution

I get the impression from your question that the majority of your confusion is in, what should the function return? Should it return a score, or a move? Generally, you should actually split up this thing in two separate functions;

A minimax() function, which should look mostly like what you seem to have so far (I did not check in detail for correctness, maybe there are little mistakes, but overall it seems close to fine at least). This should only return an integer/float/whatever, the value of a node (which is defined as either the evaluation function if you're deep enough already, or the max/min of all the children (max or min depending on which player is to move).
Something like a choose_move() function, which should return a move to play. It should do this by calling minimax() for all the children, and then returning the move that leads to the child with the greatest value (it's recommended to break ties randomly).

Note: There appear to be some mistakes in your code as well, it seems to return too often. For example, in the case of a maximizing player, you already return when you see for the first time that score > best_val, whereas you should continue looping through all the other moves too to figure out if any of them may have an even higher score.

The code for the case of the minimizing player should be more ''symmetric'' with the code for the maximizing player, it looks too different now.

EDIT: To fix the problem where a score is returned too quickly, this line:

return best_move_q_1, best_move_q_2, score

should simply be moved outside the loops going through all the possible actions. The idea is, loop through all the actions, evaluate them all (through recursive minimax calls), then return the moves and score associated with the best move. This means it has to be outside of the loops through actions, you cannot already return while still inside those loops because then you haven't finished looping through all the actions and may have missed a better alternative.

The way to do this in this case would be to simply move that particular line of code 4 tabs to the left. It should be directly under (at the same indentation level) as the for move_q_1 in moves_1: line, because that's the start of the loop through all moves.

Then, that line should additionally be changed to return best_val (the best score among all the children), instead of score (the evaluation of the last child).

After that, the code for the not maximizing_player case should be changed to be much more similar to the code for the other case above.

Then, I just noticed another thing; close to the top, you decide to evaluate if depth == 0 (or if the game state is terminal). However, in the recursive calls, you always increase the depth level you pass along. This seems strange (unless you pass in a negative depth in the very first call?). You'll probably want to do one of the following instead:

In the very first call to minimax, pass in the maximum depth you want to search for (for example, 3 or 5 or whatever). Then, always decrement it when you recursively call minimax again, instead of incrementing (to make sure it eventually reaches the depth=0 point where it'll evaluate).
Instead of evaluating when depth == 0, do your evaluation when depth == max_depth, where max_depth is again a constant like 3 or 5 or whatever. Then, your initial call to minimax should have depth=0.

I did not check in detail if there are other mistakes, so feel free to let me know if this wasn't all (or try to compare your code to pseudocode of the algorithm in other places, and see where the differences are and if you can understand them).