Search code examples
deep-learningpytorchreinforcement-learning

What particular change of formula in target changes neural network from gradient descent into gradient ascent?


It was weird when I face it in reinforcement learning. A loss is MSE. Everything should be perfect to be gradient descent and now it is a gradient ascent. I wanna know the magic. I did numpy neural network. Change in a derivative lead to gradient ascent. What particular change in a derivative lead to gradient ascent? Is it that simple that autograd sees that it is concave or convex?


Solution

  • If you're doing gradient ascent, it must mean that you are doing a variant of policy gradients reinforcement learning.

    Doing gradient ascent is extremely simple, long story short, you just apply gradient descent, except you put a minus sign in front of the gradient term!

    In tensorflow code:

    gradients = - tf.compute_gradients(loss)
    update = tf.apply_gradients(zip(gradients, vars))
    

    This is the basic gradient descent algorithm, where theta is the weights of the model, alpha is learning rate, and dJ/dtheta is the gradient of the loss function with respect to the weights.

    enter image description here

    In the above, we descent upon the gradient because we want to minimize the loss. But in policy gradient methods, we want to maximize the returns, and since we are taking the gradient with respect to the reward (intuitively), we want to maximize it.

    Please see the below picture from TowardsDataScience, you can see that naturally, the weights get updated to the direction of lowest J. (Notice the positive instead of negative)

    enter image description here

    By simply changing the sign of the update, we can instead go the other way (i.e., to maximize the reward

    enter image description here

    Below is the formal equation to gradient asent for policy gradient methods. The gradient of the policy * Vt is essentially dJ/dtheta.

    enter image description here