Search code examples
pythonmathdeep-learningneural-networkderivative

Explaining the derivative in a neural network with one neuron


I have a maximal simple neural network with one neuron, in which I do not understand where the definition of derivative variable comes from. I understand that it is the derivative, giving me the "slope" of the function, but why exactly input * clear_error? The derivative of which original function do we find, and why exactly that one?

weight = 10
goal = 12.9
input = 9

alpha = 0.01

for i in range(20):
    prediction = weight * input
    clear_error = prediction - goal
    sq_error = (prediction - goal) ** 2
    
    derivative = input * clear_error
    weight = weight - (derivative * alpha)
    
    print(f"Error: {sq_error}; Prediction: {prediction}")

Solution

  • Write the weight as w, input as x, goal as g and loss function (i.e. sq_error in the code) as l.

    Then: l = (wx - g) ** 2

    What we need for backpropagation is the gradient of the loss function with respect to weight, which is dl / dw = 2x(wx - g)

    ...which is equivalent to 2 * input * clear_error.

    So the derivative in the code is actually half the real gradient, but being out by a constant factor doesn't matter (you're scaling the gradient by alpha anyway).


    If you want to break the calculation of the derivative down further:

    Write u = wx - g so that l = u**2

    So du / dw = x and dl / du = 2u

    Then dl / dw = (dl / du) (du / dw) (chain rule)

    i.e. dl / dw = 2ux = 2x(wx - g)

    dl / dw = 2 * input * clear_error in terms of the variables in your code.

    So derivative is equivalent to (dl / dw) / 2.