Search code examples
machine-learningtensorflowdeep-learningreinforcement-learningq-learning

Reward function with a neural network approximated Q-function


In Q-learning, how should I represent my Reward function if my Q-function is approximated by a normal Feed-Forward Neural Network?

Should I represent it as discrete values "near", "very near" to the goal etc.. All I'm what concerned about, is that as long as I already moved to the neural network approximation of the Q-function Q(s, a, θ) and not using a lookup table anymore, would I still be obliged to build a Reward table as well?


Solution

  • There is no such thing as a "reward table" you are supposed to define "reward signal", which is produced in a given agent-world state in given timestamp. This reward should be a scalar (number). In general you could consider more complex rewards, but in typical setting of Q-learning reward is just a number, since the goal of the algorithm is to find a policy such that it maximizes the expected summed discounted rewards. Obviously you need an object which can be added, multiplied and finally - compared, and efficiently such objects are only numbers (or can be directly transformed to numbers). Ok, having said that for your particular case, if you know the distance to the goal you can give a reward which is invertibly proportional to the distance, it can be even -distance, or 1/distance (as this will guarantee better scaling).