Search code examples
reinforcement-learningcs231n

cs231n lec 14 reinforcement learning


I'm studying CS231N, lecture 14, "Reinforcement Learning". In the lecture, the instructor mentioned the value function, which is shown in the picture:

picture of value function

I am wondering what is that bar between rt and s0? I thought it was something like conditional probability, but I'm not sure about it. Or is it just a division?


Solution

  • It's the conditional probability. It literally means the reward at time t, given state s, following policy pi.