tensorflow neural-network reinforcement-learning loss-function q-learning

tf.losses.mean_squared_error with negative target

I'm using Q learning and i want to know if i can use the tf.losses.mean_squared_error loss calculation function if i have a reward function which can give negative rewards.

Because if i have for exemple as output of my network the following Q values : (0.1, 0.2, 1), and i calculate that my real Q values should be (0.1, -5, 1), if i use the mean_squared_error function the loss for the second Q value will go positive am i wrong ? Because of the square operation so the gradient descend will not be based on correct loss ?

Solution

Yes It works well.

You should concern the mse cost function.

mse = tf.reduce_mean(tf.square((x*w+b)-y))

The cost function calculates to square of a difference. It means always - values becomes +.

And, You are correct.
7-1 and 5+ -1 are same cost as 36.

For more understanding of Gradient descent, You need to know how to minimize mse. From the Below Image, You can see current mse by x*w+b.
At this point, Gradient descent gets a slope to decide a direction of w to be changed.

The slope is calculated by a derivative.

You can see the below formula after derivative of the mse function.

So, You can see the direction of W that The W will be moved to the left if ((w*x-y)*x) > 0 and will be moved to the right if not.