Search code examples
pythonneural-networkbackpropagationgradient-descent

In gradient checking, do we add/subtract epsilon (a tiny value) to both theta and constant parameter b?


I've been doing Andrew Ng's DeepLearning AI course (course 2).

For the exercise in gradient checking, he implements a function converting a dictionary containing all of the weights (W) and constants (b) into a single, one-hot encoded vector (of dimensions 47 x 1).

The starter code then iterates through this vector, adding epsilon to each entry in the vector.

Does gradient checking generally include adding epsilon/subtracting to the constant as well? Or is it simply for convenience, as constants play a relatively small role in the overall calculation of the cost function?


Solution

  • You should do it regardless, even for constants. The reason is simple: being constants, you know their gradient is zero, so you still want to check you "compute" it correctly. You can see it as an additional safety net