Search code examples
pythonneural-networkbackpropagationderivativeactivation-function

Confusion about sigmoid derivative's input in backpropagation


When using the chain rule to calculate the slope of the cost function relative to the weights at the layer L , the formula becomes:

d C0 / d W(L) = ... . d a(L) / d z(L) . ...

With :

z (L) being the induced local field : z (L) = w1(L) * a1(L-1) + w2(L) * a2(L-1) * ...

a (L) beeing the ouput : a (L) = & (z (L))

& being the sigmoid function used as an activation function

Note that L is taken as a layer indicator and not as an index

Now:
d a(L) / d z(L) = &' ( z(L) )

With &' being the derivative of the sigmoid function

The problem:

But in this post which is written by James Loy on building a simple neural network from scratch with python,
When doing the backpropagation, he didn't give z (L) as an input to &' to replace d a(L) / d z(L) in the chain rule function. Instead he gave it the output = last activation of the layer (L) as the input the the sigmoid derivative &'

def feedforward(self):
        self.layer1 = sigmoid(np.dot(self.input, self.weights1))
        self.output = sigmoid(np.dot(self.layer1, self.weights2))

def backprop(self):
        # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1
        d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))

Note that in code above the layer L is the layer 2 which is the last or output layer. And sigmoid_derivative(self.output) this is where the activation of the current layer is given as input to the derivative of the sigmoid function used as an activation function.

The question:

Shouldn't we use this sigmoid_derivative(np.dot(self.layer1, self.weights2)) instead of this sigmoid_derivative(self.output)?


Solution

  • It turned out that &( z(L) ) or output was used, just to accommodate to the way sigmoid_derivative was implemented.

    Here is the code of the sigmoid_derivative:

    def sigmoid(x):
        return 1.0/(1+ np.exp(-x))
    
    def sigmoid_derivative(x):
        return x * (1.0 - x)
    

    The mathematical formula of the sigmoid_derivative can be written as: &' (x) = &(x) * (1-&(x))

    So to get to the formula above, &(z) and not z was passed to sigmoid_derivative in order to return: &(z) * (1.0 - &(z))