In <Lecture 2: Markov Decision Processes> by David Silver on page 19, it has the following Derived formula:
I found is equal to
which means Gt+1 = v(St+1) so Gt = v(St).
According to Return Defination:
and according to Gt = v(St):
But the defination of Value Function is
which means
v(s) = =
which is absolutly wrong.
My question are:
First big error is a claim that E[a + b] = E[a + c]
imples b=c
, this is not how expectations work. In particular E[a + b] = E[a] + E[b]
, and E[a + c] = E[a] + E[c]
thus we have E[b] = E[c]
(and not b=c
!) so G_{t+1}
is not equal to v(S_{t+1})
, but E[G_{t+1}] = v(S_{t+1})
(which comes from the definition).
In general equality of funtion value does not make arguments equal. Same way f(x + a) = f(x + b)
does not imply a=b
for say f(x) = x^2
as it would also hold for x=0, a=-1, b=1.