python numpy machine-learning neural-network

Neural Network using Pima Indians onset of diabetes dataset

I'm working on a simple neural network from scratch using Pima Indians onset of diabetes dataset that can be downloaded from UCI Machine Learning Repository. When I run my code the error rate is always the same every iteration I don't get why is this happening but if I used XOR as data it works fine.

Here is my code

## Load Dependencies
import numpy as np
from sklearn.preprocessing import MinMaxScaler

## Seeding to reproduce random generated results
np.random.seed(1)

## We take input (X) and output (y)
data = np.loadtxt('diabetes.txt', delimiter=',')

scaler = MinMaxScaler()
scaler.fit(data)
data = scaler.transform(data)

X = data[:,0:8]
y = data[:,8].reshape(768,1)

## Define our activation function, in our case we will use sigmoid function: 1 / (1 + exp(-x))
def sigmoid(x, deriv=False):
    if(deriv == True):
        return x * (1 - x)
    return 1 / (1 + np.exp(-x))

## Initialize weights with random values 
wh = 2 * np.random.random((8, 768)) - 1
wo = 2 * np.random.random((768, 1)) - 1

# Training time
for i in range(1000):
    ## Forward propagation
    h0 = X

    ## input *  weigth + bias , activate
    h1   = sigmoid(np.dot(h0,wh))
    outl = sigmoid(np.dot(h1,wo))


    ## Compute the error of the predicted output layer to the actual result
    errorout = y - outl

    ## Compute the slope (Gradient/Derivative) of hidden and output layers Gradient of sigmoid can be returned as x * (1 – x).

    ## Compute change factor(delta) at output layer, 
    ## dependent on the gradient of error multiplied by the slope of output layer activation
    deltaoutl = errorout * sigmoid(outl,deriv=True)

    ## At this step, the error will propagate back into the network which means error at hidden layer. 
    ## For this, we will take the dot product of output layer delta with weight parameters of edges 
    ## between the hidden and output layer (wout.T).
    errorh1 = np.dot(deltaoutl,wo.T)

    ## Compute change factor(delta) at hidden layer, multiply the error at hidden layer with slope of hidden layer activation
    deltah1   = errorh1  * sigmoid(h1,deriv=True)

    ## Print error values 
    if i % 10000:
        print("Error :" + str(np.mean(np.abs(errorout))))

    ## Update weights at the output and hidden layer: 
    ## The weights in the network can be updated from the errors calculated for training example(s).
    wh += np.dot(h0.T,deltah1)
    wo += np.dot(h1.T,deltaoutl)

And the result is:

Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
...

if we change the data to:

X = np.array([[0,0],
              [0,1],
              [1,0],
              [1,1]])
y = np.array([[0],
              [1],
              [1],
              [0]])

wh =  2 * np.random.random((2,4)) - 1
wo =  2 * np.random.random((4,1)) - 1

It works the way it should. I don't get why is this happening please someone enlighten me Thank you.

Solution

A few observations about your neural network:

You aren't using a bias term
You are using linear loss (despite printing MAE)
You are using a learning rate of 1.0
You are using sigmoid activation in the hidden layer

Since your loss is going absolutely nowhere I suspect that the lack of bias is your problem. Your initialization is probably putting somewhere in the flat part of the sigmoid, and with no bias there is no way out.

If adding a bias doesn't fix your problem, try using logistic loss instead of linear loss, try using a smaller learning rate, and try using ReLU activation functions in your hidden layer.

UPDATE:

I took a closer look and found a major bug: Your weight matrices have the wrong shapes. Here are the correct shapes:

n_hidden_units = 16

## Initialize weights with random values 
wh = np.random.uniform(size=(X.shape[1], n_hidden_units))
wo = np.random.uniform(size=(n_hidden_units, 1))

This change alone didn't fix the problem, I also had to experiment with the learning rate. I found 0.01 to work well. This value was sensitive to the number of hidden units I used.

I also got slightly better performance by adding a bias term. You can do this with:

X = np.c_[X, np.ones(768)]