Search code examples
neural-networkgradient-descent

Does correction to weights include derivative of Sigmoid function also?


Let's evaluate usage of this line in the block of code given below. L1_delta = L1_error * nonlin(L1,True) # line 36

import numpy as np #line 1

# sigmoid function
def nonlin(x,deriv=False):
    if(deriv==True):
        return x*(1-x)
    return 1/(1+np.exp(-x))

# input dataset
X = np.array([  [0,0,1],
                [0,1,1],
                [1,0,1],
                [1,1,1] ])

# output dataset            
y = np.array([[0,0,1,1]]).T

# seed random numbers to make calculation
# deterministic (just a good practice)
np.random.seed(1)

# initialize weights randomly with mean 0
syn0 = 2*np.random.random((3,1)) - 1

for iter in range(1000):

    # forward propagation
    L0 = X
    L1 = nonlin(np.dot(L0,syn0))

    # how much did we miss?
    L1_error = y - L1

    # multiply how much we missed by the 
    # slope of the sigmoid at the values in L1
    L1_delta = L1_error * nonlin(L1,True) # line 36

    # update weights
    syn0 += np.dot(L0.T,L1_delta)

print ("Output After Training:")
print (L1)

I wanted to know, is the line required? Why do we need the factor of derivative of Sigmoid?

I have seen many similar logistic regression examples where derivative of Sigmoid is not used. For example https://github.com/chayankathuria/LogReg01/blob/master/GradientDescent.py


Solution

  • Yes, the line is indeed required. You need the derivative of the activation function (in this case sigmoid) because your final output is only implicitly dependent of the weights. That's why you need to apply the chain rule where the derivative of the sigmoid will appear.

    I recommend you to take a look at this post regardind backpropagation: https://datascience.stackexchange.com/questions/28719/a-good-reference-for-the-back-propagation-algorithm

    It explains the mathematics behind backpropagation quite well.