Monte carlo tree search keeps getting stuck in an infinite loop when playing (as opposed to training)

I have tried to make my own implementation of the Monte Carlo Tree search algorithm for a simple boardgame, and it seems to work reasonable while learning. However when I switch from playing to arena mode for evaluation, the mcts gets stuck in an infinite loop.

The reason for this is that while learning it pseudo-randomly picked actions based on their probability, but during arena mode this is switched to picking the most likely action to win. Unfortunately it seems that in one of the arena games this means that the game ends up in a loop, where a certain boardstate is reached and then after n actions that same boardstate is reached again, and again after each n actions...

I feel like I'm missing a component in the mcts algorithm that should prevent this from happening? or is this intended by mcts and is instead a fault of the boardgame, which should then have a draw mechanism built in to detect such things?

Solution

This can indeed happen in reinforcement learning. Another symptom can be agents not really trying to end the game/episode when they're easily able to do so and even "win".

Some possible solutions:

Modify the reward to give some small penalty to all agents (or only the winning agent) for longer games
Modify the environment to terminate after a fixed number of games with some fixed reward, maybe a draw with reward zero.

Combining both works too, with the latter acting as a failsafe and the former as a slight encouragement during the episode to try to make progress.