Let k = alpha * partial_derivative ( J (theta1) ) w.r.t theta1
theta1 := theta1 - k
.
In the course by Andrew, he said that alpha is the learning rate. If the derivative is positive we subtract alpha * k
and if negative we add it. Why do we need to subtract this alpha * partial_derivative ( J (theta1) ) w.r.t theta1
instead of alpha * just the sign of derivative
?
What is the need of the multiplication there? Thanks.
We need to decrease the value of k - the step value while we get to the minimum. As we know when we reach the minimum the derivative also gets to zero. So we multiply alpha and the derivative to generate a stepping value that tends to zero while we reach the minimum.