Neural Network with two neurons

I was trying to implement a simple neural network from scratch using python. This neural network has only two neurons and the task is to match the input to output. (i.e. x = 0 --> output = 0, x = 1 --> output = 1)

I have used partial derivatives and try to maximize negative loss using gradient ascent. (Complete code is shown below) Even after training for more than 10000 of iterations, the output is not good enough. (I think maybe the loss is stuck at a local maxima perhaps.) Can anyone help me figuring out what's wrong with my implementation.

import random
import numpy as np
import math

def sigmoid(x):
  return 1 / (1 + np.exp(-x))

def error(d,z):
    return -0.5 * np.sum(np.power(d-z, 2))

# x = input
##x = np.random.choice((0,1),10000)
x = np.array([0, 1])
# y = desired output
d = np.copy(x)

# weights of two neurons
w = np.random.rand(2)

# now training using backprop
gradient = np.random.rand(2)

iterations = 800
rate = 5

k = 1
for i in xrange(1, iterations + 1):
    y = sigmoid(w[0] * x)
    z = sigmoid(w[1] * y)

    gradient[0] = np.sum(z * w[1] * y * x * (d-z) * (1-y) * (1-z))
    gradient[1] = np.sum(y * z * (d-z) * (1-z))

    w[0] += gradient[0] * rate
    w[1] += gradient[1] * rate

    print "Iteration %d, Error %f, Change %f" % (i, error(d,z), ((gradient[0] * rate) ** 2 + (gradient[1] * rate) ** 2)**0.5)

    change = ((gradient[0] * rate) ** 2 + (gradient[1] * rate) ** 2)**0.5

    if change < 0.00001:
        break

## now test
print "1",
x = 1
y = sigmoid(w[0]*x)
z = sigmoid(w[1]*y)
print z

print "0",
x = 0
y = sigmoid(w[0]*x)
z = sigmoid(w[1]*y)
print z

Solution

Your simple network cannot learn this function.

The problem is lack of bias in the neurons. If we call your two weights W1 and W2, you can see the problem:

If input is 0, then W1 makes no difference, output of first layer is 0.5 and output of second layer will be sigmoid( 0.5 * W2 ). To learn to output a value of 0, then the network has to make W2 big and negative.
If input is 1, then call output of first layer is N, which must be between 0 and 1. The output of second layer will be sigmoid( N * W2 ). If W2 is large and negative, then the best the network can do is learn a large negative weight for W1, making N close to zero. But that will still at best learn to output something < 0.5, because sigmoid(0) is 0.5.

Whatever weights you choose, you cannot get close to [0,1] output for [0,1] input. The solution is to add at least one bias term, in the second layer, although it would be more normal to have a bias on each neuron.