Search code examples
machine-learningneural-networkartificial-intelligencereinforcement-learningpytorch

deep reinforcement learning parameters and training time for a simple game


I want to learn how deep reinforcement algorithm works and what time it takes to train itself for any given environment. I came up with a very simple example of environment:

There is a counter which holds an integer between 0 to 100. counting to 100 is its goal.

there is one parameter direction whose value can be +1 or -1. it simply show the direction to move.

out neural network takes this direction as input and 2 possible action as output.

  1. Change the direction
  2. Do not change the direction

1st action will simply flip the direction (+1 => -1 or -1 =>+1). 2nd action will keep the direction as it is.

I am using python for backend and javascript for frontend. It seems to take too much time, and still it is pretty random. i have used 4 layer perceptron. training rate of 0.001 . memory learning with batch of 100. Code is of Udemy tutorial of Artificial Intelligence and is working properly.

My question is, What should be the reward for completion and for each state.? and how much time it is required to train simple example as that.?


Solution

  • In Reinforcement Learning the underlining reward function is what defines the game. Different reward functions lead to different games with different optimal strategies.

    In your case there are a few different possibilities:

    1. Give +1 for reaching 100 and only then.
    2. Give +1 for reaching 100 and -0.001 for every time step it is not at 100.
    3. Give +1 for going up -1 for going down.

    The third case is way too easy there is no long term planing involved. In the first too cases the agent will only start learning once it accidentally reaches 100 and sees that it is good. But in the first case once it learns to go up it doesn't matter how long it takes to get there. The second is the most interesting where it needs to get there as fast as possible.

    There is no right answer for what reward to use, but ultimately the reward you choose defines the game you are playing.

    Note: 4 layer perceptron for this problem is Big Time Overkill. One layer should be enough (this problem is very simple). Have you tried the reinforcement learning environments at OpenAI's gym? Highly recommend it, they have all the "classical" reinforcement learning problems.