When using the chain rule to calculate the slope of the cost function relative to the weights at the layer L
, the formula becomes:
d C0 / d W(L) = ... . d a(L) / d z(L) . ...
With :
z (L)
being the induced local field :z (L) = w1(L) * a1(L-1) + w2(L) * a2(L-1) * ...
a (L)
beeing the ouput :a (L) = & (z (L))
&
being the sigmoid function used as an activation function
Note that L
is taken as a layer indicator and not as an index
Now:
d a(L) / d z(L) = &' ( z(L) )
With &'
being the derivative of the sigmoid function
The problem:
But in this post which is written by James Loy on building a simple neural network from scratch with python,
When doing the backpropagation, he didn't give z (L)
as an input to &'
to replace d a(L) / d z(L)
in the chain rule function. Instead he gave it the output = last activation of the layer (L)
as the input the the sigmoid derivative &'
def feedforward(self): self.layer1 = sigmoid(np.dot(self.input, self.weights1)) self.output = sigmoid(np.dot(self.layer1, self.weights2)) def backprop(self): # application of the chain rule to find derivative of the loss function with respect to weights2 and weights1 d_weights2 = np.dot(self.layer1.T, (2*(self.y - self.output) * sigmoid_derivative(self.output)))
Note that in code above the layer L
is the layer 2
which is the last or output layer.
And sigmoid_derivative(self.output)
this is where the activation of the current layer is given as input to the derivative of the sigmoid function used as an activation function.
The question:
Shouldn't we use this sigmoid_derivative(np.dot(self.layer1, self.weights2))
instead of this sigmoid_derivative(self.output)
?
It turned out that &( z(L) )
or output
was used, just to accommodate to the way sigmoid_derivative
was implemented.
Here is the code of the sigmoid_derivative
:
def sigmoid(x): return 1.0/(1+ np.exp(-x)) def sigmoid_derivative(x): return x * (1.0 - x)
The mathematical formula of the sigmoid_derivative
can be written as: &' (x) = &(x) * (1-&(x))
So to get to the formula above, &(z)
and not z
was passed to sigmoid_derivative
in order to return: &(z) * (1.0 - &(z))