Search code examples
treemachine-learningartificial-intelligencemontecarlorollout

How to perform Roll-out in MCTS in Complex Games


Okay I basically understand How MCTS works with node selection etc. What I don't understand is the random roll-out phase. Is it correct that I am randomly simulating future game steps till it ends in win or loss? Isn't the roll-out taking very long with more Complex Games with many states and possible actions and unknown enemy Moves? If you randomly roll-out enemy moves till you reach the end of the game, is it not just as good to just return win or loss randomly? I would be delighted if someone could explain the roll-out phase in a simple example like a 3 or 4 step game.

Thanks in advance.


Solution

  • Simulating a random game is more informative than randomly returning win or loss.

    Imagine a TicTacToe board where one color cannot win anymore, but the other color can. Obviously random roll outs can reveal this fact.

    In addition there usually is actual information in the probability that a sample returns a certain outcome. A situation in which you win 90% of all random plays might be preferable to one where you win only 10% of all random plays. Of course, this cannot be stated in general. One branch may contain a certain win only if the single correct response is played — and this same branch might feature many possible paths to defeat.

    Also one possible improvement to MCTS is to do smarter than random playouts.