Search code examples
pythonkerasdeep-learningq-learning

Deep Q Learning Approach for the card game Schnapsen


So I have a DQN Agent that plays the card game Schnapsen. I wont bore you with the details of the game as they are not so related to the question I am about to ask. The only important point is that for every round of the game, there are specific valid moves a player can take. The DQN Agent I have created sometime outputs non-valid moves, in the form of an integer. There are 28 possible moves in the entire game, so sometimes it will output a move that cannot be played based on the current state of the game, for example playing the Jack of Diamonds when it is not in its hand. I was wondering if there was any way for me to "map" the outputs of the neural network into the most similar move in the case that it does not converge? Would that be the best approach to this problem or do I have to tune the neural network better?

As of right now, whenever the DQN Agent does not output a valid move, it falls on to another algorithm, a Bully Bot implementation that plays one of the possible valid moves. Here is the link to my github repo with the code. To run the code where the DQN Agent plays against a bully bot, just navigate into the executables file and run : python cli.py bully-bot


Solution

  • One approach to mapping the outputs of your neural network to the most similar valid move would be to use "softmax" to convert the raw outputs of the network into a probability distribution over the possible moves. Then, you could select the move with the highest probability that is also a valid move. Another approach could be to use "argmax" which returns the index of the maximum value in the output. Then you will have to check whether the returned index corresponds to a valid move or not. If not, you can select the next possible index which corresponds to a valid move.