I'm trying to build a game of Connect 4 with minimax (and alpha beta pruning), mostly to prove to myself that I can do it. However, the one big conceptual problem I'm having is with how to actually utilize the minimax algorithm. The way I do it is that I have an AI class that has one function which is to perform the minimax algorithm that returns an int.
public int minimax(Board board, int depth, int alpha, int beta, String player) {
if(depth == 0 || board.getScore() >= 512) {
return board.getScore();
}
else if(player.equals("computer")) {
int temp = -1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.max(temp, minimax(board, depth-1, alpha, beta, "human"));
board.removePiece(moves[0], moves[1]);
alpha = Math.max(alpha, temp);
if (alpha >= beta) {
break;
}
}
return temp;
}
else {
int temp = 1000000;
for(Integer[] moves : board.availableMoves) {
board.putPiece(player, moves[0]);
temp = Math.min(temp, minimax(board, depth+1, alpha, beta, "computer"));
board.removePiece(moves[0], moves[1]);
beta = Math.min(beta, temp);
if(alpha >= beta) {
break;
}
}
return temp;
}
}
This is called by a function of the Game class called computerMove().
public int computerMove() {
Board tempBoard = board;
int bestMove = 0;
AI ai = new AI();
ai.minimax(board, difficulty, -1000000, 1000000, "computer");
return bestMove;
}
But, what do I do with the int that is returned? How do I utilize that to actually move the piece? The int that is returned is simply the best possible board I could get, right? It tells me nothing in particular about the location or board that I should do.
Any and all help is greatly appreciated.
Thanks,
The books all say to return just the score, but that's impractical for actually playing the game. Of course the overhead of maintaining the best move everywhere can really slow down the program, so generally you use a driver function that does the first level of expansion, and additionally keeps track of the best move. This is effectively wrapping the implementation in an argmax
function, which is just a fancy way of saying it returns the best move at the top level instead of the score. You can see an example of this in a little project I worked on last year. The code is in C# but it's close enough to Java for you to get the idea.
Alternatively, you can modify the code to return a tuple (class with multiple fields) that has the score and the best move. This is easier (and a little cleaner IMO) than writing the argmax wrapper, but without some extra engineering this will probably result in some noticeable slow down of the minimax function because it's going to result in tons more allocations. If performance isn't your top priority, this is probably the way to go.
I should also point out, that your implementation has at least one bug. The depth should always be decreasing regardless of who is playing and in your human branch you have it increase for the human player. This means the depth will never get to 0 and the base case will only be hit when a player is determined to be the winner. Additionally, when using alpha beta, it's important that the board evaluation knows whose turn it is and who is the maximizing player, else you'll run into lots of hard to find bugs. You don't show that code here, but I want to point that out because it gets me every time.