Search code examples
c++artificial-intelligenceminimaxalpha-beta-pruning

MiniMax with Alpha Beta Pruning for Othello not working


I have the following implementation of a alpha beta minimax for an othello (reversi) game. Somehow, this never really returns the proper action to take. It seems to return the default action I put in the function (0, 0) and the secondary value of -32768, which means it got pruned at the MAX subroutine. Any tips on what I can improve with this and how I can fix this problem?

Note: I've identified the successors being returned properly for the most part. The max depth for now is 8. Computer player's pn (player number) is 1 and the human player's is 0. The first stage, 0, is MINIMAX_MAX. Alpha and beta are initially set to INT_MIN and INT_MAX respectively.

mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage) {
    if (G.check_terminal_state() || depth == MAX_DEPTH) {
#ifdef DEBUG
        cout << "best action: (" << A.get_x() << ", " << A.get_y() << ")\n";
#endif
        return mm_out(A, G.get_utility(pn));
    }

    // add end game score total here

#ifdef DEBUG
    if (stage == MINIMAX_MAX) {
        cout << "max " << alpha << " " << beta << "\n";
    }
    else {
        cout << "min " << alpha << " " << beta << "\n";
    }
#endif

    set<Action> succ_temp = G.get_successors(pn);
    for (Action a : succ_temp) {

#ifdef DEBUG
        cout << a.get_x() << " " << a.get_y() << '\n';
#endif

        Grid gt(G);
        a.evaluate(gt);
    }
    set<Action, action_greater> successors(succ_temp.begin(), succ_temp.end());

#ifdef DEBUG
    Player p(0, "minimaxtest");
    G.display(p);
    int test;
    cin >> test;
#endif

    // if no successor, that player passes
    if (successors.size()) {
        for (auto a = successors.begin(); a != successors.end(); ++a) {
            Grid gt(G);
            gt.do_move(pn, a->get_x(), a->get_y(), !PRINT_ERR);
            Action at = *a;
            mm_out mt = minimax(gt, alpha, beta, at, pn ^ 1, depth + 1, !stage);
            int temp = mt.val;
//          A = mt.best_move;

            if (stage == MINIMAX_MAX) {
                if (alpha < temp) {
                    alpha = temp;
                    A = *a;
#ifdef DEBUG
                    cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") alpha = " << alpha << "\n";
#endif
                }
                if (alpha >= beta) {
#ifdef DEBUG
                    cout << "pruned at max\n";
#endif
                    return mm_out(A, beta);
                }
            }
            else {
                if (beta > temp) {
                    beta = temp;
                    A = *a;
#ifdef DEBUG
                    cout << "Current action: (" << A.get_x() << ", " << A.get_y() << ") beta = " << beta << "\n";
#endif
                }
                if (alpha >= beta) {
#ifdef DEBUG
                    cout << "pruned at min\n";
#endif
                    return mm_out(A, alpha);
                }


}
    }
    return mm_out(A, (stage == MINIMAX_MAX) ? alpha : beta);
}
else {
    cout << "no successor\n";
    return mm_out(A, (stage == MINIMAX_MAX) ? (std::numeric_limits<int>::max() - 1) : (std::numeric_limits<int>::min() + 1));
}

}

Utility function:

int Grid::get_utility(uint pnum) const {
    if (pnum)
        return wcount - bcount;
    return bcount - wcount;
}

Solution

  • You should pass the alpha / beta parameters by value (not by reference):

    mm_out minimax(Grid& G, int alpha, int beta, Action& A, uint pn, uint depth, bool stage)
    

    Each node passes the alpha and beta values to its children. The children then update their own copies of the alpha or beta value depending on whose turn it is and return the final evaluation of that node. That is then used to update the alpha or beta value of the parent.