python machine-learning pygame keras reinforcement-learning

Python game Neural network. How to setup inputs

I am in the process of making a tank game in pygame where you move a tank around walls and shoot other tanks.

I am trying to find a way to make a neural network, probably reinforcement learning for the enemies so that the game will make decisions on which tanks should move where, if they should shoot etc by passing attributes from each object.

Attributes:
Enemy -> x,y,width,height,speed,health and other items  
Wall -> x,y,width,height  
Bullet -> x,y,width,height,speed  
Player -> x,y,width,height,speed,health

I was planning to use the keras python module to create a neural network, however I cannot find a way to set it up so that the input data is the correct shape and size as there will be a variable number of walls and bullets.

What I would like to do :

action = Network.predict(state)

where
state = (Enemy, Player, Tuple_of_Wall_Data, Tuple_of_Bullet_Data)

and action is an option on where the enemy should move in the form
action = (Direction,Should_Shoot)

TLDR My question is , how would I set up a Neural network input layer so it can take (1 enemy , 1 player , multiple walls, multiple bullets) and train the neural network to give the enemy a direction and if it should fire using reinforcement learning ?

Solution

There are three typical ways of representing the game state for an AI agent:

internal game state, pretty much what you are proposing - list of objects in the game, with their raw attributes. If one wants to use ML for that you need an architecture which deals with varied sizes, thus probably you end up with recurrent neural networks, processing objects one by one. Note, that this is probably highly suboptimal representation. In particular as a human you do not get a game state like this, you do not get a stream of objects.
Global map view. if the game has small enough map, it can be whole feeded in as an input to the agent, one ends up with fully observable problem, and data of form W x H x K, where W ,H is width and height of the map, and K is number of objects types (thus you get one-hot encoding of each object)
Agent's "vision", which is probably the most popular one in modern RL, where agent is presented again with W x H x K, but now W and H is size of its vision (which moves with the agent).