Let's evaluate usage of this line in the block of code given below.
L1_delta = L1_error * nonlin(L1,True) # line 36
import numpy as np #line 1
# sigmoid function
def nonlin(x,deriv=False):
if(deriv==True):
return x*(1-x)
return 1/(1+np.exp(-x))
# input dataset
X = np.array([ [0,0,1],
[0,1,1],
[1,0,1],
[1,1,1] ])
# output dataset
y = np.array([[0,0,1,1]]).T
# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)
# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1
for iter in range(1000):
# forward propagation
L0 = X
L1 = nonlin(np.dot(L0,syn0))
# how much did we miss?
L1_error = y - L1
# multiply how much we missed by the
# slope of the sigmoid at the values in L1
L1_delta = L1_error * nonlin(L1,True) # line 36
# update weights
syn0 += np.dot(L0.T,L1_delta)
print ("Output After Training:")
print (L1)
I wanted to know, is the line required? Why do we need the factor of derivative of Sigmoid?
I have seen many similar logistic regression examples where derivative of Sigmoid is not used. For example https://github.com/chayankathuria/LogReg01/blob/master/GradientDescent.py
Yes, the line is indeed required. You need the derivative of the activation function (in this case sigmoid) because your final output is only implicitly dependent of the weights. That's why you need to apply the chain rule where the derivative of the sigmoid will appear.
I recommend you to take a look at this post regardind backpropagation: https://datascience.stackexchange.com/questions/28719/a-good-reference-for-the-back-propagation-algorithm
It explains the mathematics behind backpropagation quite well.