machine-learning neural-network reinforcement-learning

Choosing a neural network architecture for Snake AI Agent

I'm new to machine learning and reinforcement learning, and I'm attempting to create an AI agent that learns to play Snake. I am having trouble choosing / developing a neural network architecture that can work with the shape of my input / output vectors.

My input is a 3x10x10 tensor, basically 3 layers of a 10x10 grid the snake moves on (I only use 0s and 1s throughout the tensor, mark the position of the snake's body parts in the first layer, mark the apple's position on the second layer, and the snake's head position on the 3rd).

For my output, I'm looking for a vector of 4 values, corresponding to the 4 possible moves a player has available (change direction to up / down / left / right).

I would appreciate any recommendations on how to go about choosing an architecture in this case, as well as any thoughts regarding the way I chose to encode my game state into an input vector for the agent to train.

Solution

You could maybe use a ResNet architecture at the beginning and see what happens. Basically, the ResNet takes as input an image of a shape HxWxC, where H-height, W-width, C-channels. In your case, you do not have an actual image, but you still encode your environment in 3 channels, with a HxW=10x10. So, I think your encoding should work.

Then you will also have to change the output of the ResNet so that you will only output 4 values and each value will correspond to one action.

Given that the input space is not that big, maybe you could start with a ResNet 18 which is very small and see what happens. Given that you are new to ML and RL, there is a very old paper that tries to solve Atari games using deep learning https://arxiv.org/pdf/1312.5602v1.pdf and the method is not that hard to understand. Snake is a game with a similar (or even less) complexity as Atari games, so this paper may provide more insights.