I have a maximal simple neural network with one neuron, in which I do not understand where the definition of derivative
variable comes from. I understand that it is the derivative, giving me the "slope" of the function, but why exactly input * clear_error
? The derivative of which original function do we find, and why exactly that one?
weight = 10
goal = 12.9
input = 9
alpha = 0.01
for i in range(20):
prediction = weight * input
clear_error = prediction - goal
sq_error = (prediction - goal) ** 2
derivative = input * clear_error
weight = weight - (derivative * alpha)
print(f"Error: {sq_error}; Prediction: {prediction}")
Write the weight as w
, input as x
, goal as g
and loss function (i.e. sq_error
in the code) as l
.
Then: l = (wx - g) ** 2
What we need for backpropagation is the gradient of the loss function with respect to weight, which is dl / dw = 2x(wx - g)
...which is equivalent to 2 * input * clear_error
.
So the derivative in the code is actually half the real gradient, but being out by a constant factor doesn't matter (you're scaling the gradient by alpha anyway).
If you want to break the calculation of the derivative down further:
Write u = wx - g
so that l = u**2
So du / dw = x
and dl / du = 2u
Then dl / dw = (dl / du) (du / dw)
(chain rule)
i.e. dl / dw = 2ux = 2x(wx - g)
dl / dw = 2 * input * clear_error
in terms of the variables in your code.
So derivative
is equivalent to (dl / dw) / 2
.