Search code examples
reinforcement-learningmarkov-chainsmarkov

Why Gt+1 = v(St+1) in Bellman Equation for MRPs?


In <Lecture 2: Markov Decision Processes> by David Silver on page 19, it has the following Derived formula: v(s) equation

I found enter image description here is equal to enter image description here which means Gt+1 = v(St+1) so Gt = v(St).

According to Return Defination:

enter image description here

and according to Gt = v(St):

v(St) = Gt = enter image description here

But the defination of Value Function is

enter image description here

which means v(s) = enter image description here = enter image description here which is absolutly wrong.

My question are:

  1. Why Gt+1 = v(St+1)?
  2. Where are my derivation mistakes?

Solution

  • First big error is a claim that E[a + b] = E[a + c] imples b=c, this is not how expectations work. In particular E[a + b] = E[a] + E[b], and E[a + c] = E[a] + E[c] thus we have E[b] = E[c] (and not b=c!) so G_{t+1} is not equal to v(S_{t+1}), but E[G_{t+1}] = v(S_{t+1}) (which comes from the definition).

    In general equality of funtion value does not make arguments equal. Same way f(x + a) = f(x + b) does not imply a=b for say f(x) = x^2 as it would also hold for x=0, a=-1, b=1.