I'm studying CS231N, lecture 14, "Reinforcement Learning". In the lecture, the instructor mentioned the value function, which is shown in the picture:
I am wondering what is that bar between rt
and s0
? I thought it was something like conditional probability, but I'm not sure about it. Or is it just a division?
It's the conditional probability. It literally means the reward at time t
, given state s
, following policy pi
.