Search code examples
tensorflowneural-networkreinforcement-learningloss-functionq-learning

tf.losses.mean_squared_error with negative target


I'm using Q learning and i want to know if i can use the tf.losses.mean_squared_error loss calculation function if i have a reward function which can give negative rewards.

Because if i have for exemple as output of my network the following Q values : (0.1, 0.2, 1), and i calculate that my real Q values should be (0.1, -5, 1), if i use the mean_squared_error function the loss for the second Q value will go positive am i wrong ? Because of the square operation so the gradient descend will not be based on correct loss ?


Solution

  • Yes It works well.

    You should concern the mse cost function.

    mse = tf.reduce_mean(tf.square((x*w+b)-y))
    

    The cost function calculates to square of a difference. It means always - values becomes +.

    And, You are correct.
    7-1 and 5+ -1 are same cost as 36.

    For more understanding of Gradient descent, You need to know how to minimize mse. From the Below Image, You can see current mse by x*w+b.
    At this point, Gradient descent gets a slope to decide a direction of w to be changed.

    enter image description here

    The slope is calculated by a derivative.

    enter image description here

    You can see the below formula after derivative of the mse function.

    enter image description here

    So, You can see the direction of W that The W will be moved to the left if ((w*x-y)*x) > 0 and will be moved to the right if not.