Backpropagation: Why doesn't error approach zero when it is multiplied by deriv of sigmoid?

I'm trying to implement backpropagation to my simple neural network which looks like this: 2 inputs, 2 hidden(sigmoid), 1 output(sigmoid) . But it doesn't seem to work properly.

import numpy as np

 # Set inputs and labels
 X = np.array([ [0, 1],
                [0, 1],
                [1, 0],
                [1, 0] ])

 Y = np.array([[0, 0, 1, 1]]).T

 # Make random always the same
 np.random.seed(1)
 # Initialize weights
 w_0 = 2 * np.random.rand(2, 2) - 1
 w_1 = 2 * np.random.rand(1, 2) - 1

 # Learning Rate
 lr = 0.1

 # Sigmoid Function/Derivative of Sigmoid Function
 def sigmoid(x, deriv=False):
     if(deriv==True):
         return x * (1 - x)
     return 1/(1 + np.exp(-x))

 # Neural network
 def network(x, y, w_0, w_1):
     inputs = np.array(x, ndmin=2).T
     label = np.array(y, ndmin=2).T

     # Forward Pass
     hidden = sigmoid(np.dot(w_0, inputs))
     output = sigmoid(np.dot(w_1, hidden))

     # Calculate error and delta
     error = label - output
     delta = error * sigmoid(output, True)

     hidden_error = np.dot(w_1.T, error)
     delta_hidden = error * sigmoid(hidden, True)

     # Update weight
     w_1 += np.dot(delta, hidden.T) * lr
     w_0 += np.dot(delta_hidden, record.T) * lr

     return error

 # Train
 for i in range(6000):
     for j in range(X.shape[0]):
         error = network(X[j], Y[j], w_0, w_1)

         if(i%1000==0):
             print(error)

When I print out my error I get: .

Which isn't right because it's not close to 0.

When I change delta to:

delta = error

It somehow works.

But why? Shouldn't we multiply an error by derivative of sigmoid function before we pass it further ?

Solution

I think, it should be

delta_hidden = hidden_error * sigmoid(hidden, True)