I'm using Q learning and i want to know if i can use the tf.losses.mean_squared_error loss calculation function if i have a reward function which can give negative rewards.
Because if i have for exemple as output of my network the following Q values : (0.1, 0.2, 1), and i calculate that my real Q values should be (0.1, -5, 1), if i use the mean_squared_error function the loss for the second Q value will go positive am i wrong ? Because of the square operation so the gradient descend will not be based on correct loss ?
Yes It works well.
You should concern the mse
cost function.
mse = tf.reduce_mean(tf.square((x*w+b)-y))
The cost function calculates to square of a difference. It means always -
values becomes +
.
And, You are correct.
7-1
and 5+ -1
are same cost as 36.
For more understanding of Gradient descent, You need to know how to minimize mse
.
From the Below Image, You can see current mse
by x*w+b
.
At this point, Gradient descent gets a slope to decide a direction of w to be changed.
The slope is calculated by a derivative.
You can see the below formula after derivative of the mse
function.
So, You can see the direction of W that The W will be moved to the left if ((w*x-y)*x) > 0
and will be moved to the right if not.