Search code examples
reinforcement-learning

Q learning algorithm-convergence on a loop(absorbing) state


This question is to do with Q-learning.

Please consider the following:

  1. A loop(absorbing) state J- with reward 100 to go from J to J(J is the final state-the reward from going from I to J is also 100)
  2. gamma value of 1
  3. alpha value 0.5

say the transition J to J has already got a Q value of 100. The new Q value is given by: 100+0.5(100+1(100)-100) where Q(max next possible states) is 100 as if you are in state J, to get the max possible next Q value, you would loop(so the max next poss Q value is what it currently is-100). This gives you a new Q value of 150. Taking this to a logical conclusion,every time you loop on J, the Q value goes up by 50 and that particular Q value will never converge and this seems wrong to me(is this wrong?).(the others values coverge). I've done this experiment so many times already and am still unsure about this. Please clarify the above point if you can. We have been taught Q learning very badly at my university, and I have a coursework to hand in in a week and a half.

Thanks!


Solution

  • According to Wikipedia, gamma has to be strictly less than one.