Search code examples
neural-networkartificial-intelligence

Output of Artificial Neural Network in Othello


I'm implementing Othello using Artificial neural network. When I read document (here, page 19), I don't understand some points. They calculate the output: image I dont know if they calculate that, how this my AI know what the legal moves in game to choose the best legal move. That ouput is only a float number (I think so) and how I can use it?


Solution

  • The good news

    It's super simple: the Neural-Network (NN) is a Value-Network (instead of a Policy-Network). This Value-Network takes a board-state as input and calculates some score describing how good the position is. It's the basic building-block of all Minimax-based Game-AIs, often called the evaluation function. (A Policy-Network output would give a probability-distribution over all possible moves)

    So the NN gives you this score. You can then combine this score with some algorithm of your choice. Minimax (nearly all Chess-AIs) and MCTS (AlphaGo) are the most common.

    Basic idea of Minimax: play a move, opponent plays a move, (repeat), evaluate with your NN -> do this for all possible combinations and propagate with Minimax. Only a few ply's (half-moves) will be possible with this NN, but it will be very powerful for Othello and it's easy to implement.

    Basic idea of MCTS: play random move, play random move, (repeat), until game ends -> build-winner statistic. Now compare the average scores of all possible "first" moves. Pick the best. (Harder to incorporate NN as a heuristic.)

    The calculation you mentioned is just the classic rule in Neural Networks to define the activation together with a dense-layer.

    The bad news

    I didn't read the paper, but the hard thing is to train and prepare your NN. You need to provide some data. Maybe it will be supervised (if you have historical games; easier), maybe unsupervised (Q-learning and co.). This will be very hard to do without experience.

    I think I know all the theory needed, but I still failed to do this with some other (stochastic) games, because there are many many issues with autocorrelation and co, there is also a lot of hyperparameter-tuning needed.

    Conclusion

    This project is kind of complicated and there are many many pitfalls. Please be sure you understand the algorithms you want to try. It looks like you are kind of missing the basics. Game-theory (Minimax), AI/Learning-Theory (MCTS, Markov-Decision-Processes, Q-Learning...), NN (basic internals of a NN).