Search code examples
pythontensorflowmachine-learninglinear-regressiongradient-descent

What is the difference between step size and learning rate in machine learning?


I am using TensorFlow to implement some basic ML code. I was wondering if anyone could give me a short explanation of the meaning of and difference between step size and learning rate in the following functions.

I used tf.train.GradientDescentOptimizer() to set the parameter learning rate and linear_regressor.train() to set the number of steps. I've been looking through the documentation on tensorflow.org for these functions but I still do not have a complete grasp of the meaning of these parameters.

Thank you and let me know if there is any more info I can provide.


Solution

  • In SGD, you compute the gradient for a batch and move the parameters in the direction of said gradient by an amount defined by the learning rate lr:

    params=old_params - lr* grad 
    

    where grad is the gradient of the loss w.r.t the params.

    The step in tensorflow or similar libraries usually just denotes the number of such updates per epoch. So if you have step=1000 and lr=0.5, you will be calling the pseudocode above 1000 times with lr=0.5 in each epoch.