I'm working on a simple neural network from scratch using Pima Indians onset of diabetes dataset that can be downloaded from UCI Machine Learning Repository. When I run my code the error rate is always the same every iteration I don't get why is this happening but if I used XOR as data it works fine.
Here is my code
## Load Dependencies
import numpy as np
from sklearn.preprocessing import MinMaxScaler
## Seeding to reproduce random generated results
np.random.seed(1)
## We take input (X) and output (y)
data = np.loadtxt('diabetes.txt', delimiter=',')
scaler = MinMaxScaler()
scaler.fit(data)
data = scaler.transform(data)
X = data[:,0:8]
y = data[:,8].reshape(768,1)
## Define our activation function, in our case we will use sigmoid function: 1 / (1 + exp(-x))
def sigmoid(x, deriv=False):
if(deriv == True):
return x * (1 - x)
return 1 / (1 + np.exp(-x))
## Initialize weights with random values
wh = 2 * np.random.random((8, 768)) - 1
wo = 2 * np.random.random((768, 1)) - 1
# Training time
for i in range(1000):
## Forward propagation
h0 = X
## input * weigth + bias , activate
h1 = sigmoid(np.dot(h0,wh))
outl = sigmoid(np.dot(h1,wo))
## Compute the error of the predicted output layer to the actual result
errorout = y - outl
## Compute the slope (Gradient/Derivative) of hidden and output layers Gradient of sigmoid can be returned as x * (1 – x).
## Compute change factor(delta) at output layer,
## dependent on the gradient of error multiplied by the slope of output layer activation
deltaoutl = errorout * sigmoid(outl,deriv=True)
## At this step, the error will propagate back into the network which means error at hidden layer.
## For this, we will take the dot product of output layer delta with weight parameters of edges
## between the hidden and output layer (wout.T).
errorh1 = np.dot(deltaoutl,wo.T)
## Compute change factor(delta) at hidden layer, multiply the error at hidden layer with slope of hidden layer activation
deltah1 = errorh1 * sigmoid(h1,deriv=True)
## Print error values
if i % 10000:
print("Error :" + str(np.mean(np.abs(errorout))))
## Update weights at the output and hidden layer:
## The weights in the network can be updated from the errors calculated for training example(s).
wh += np.dot(h0.T,deltah1)
wo += np.dot(h1.T,deltaoutl)
And the result is:
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
Error :0.651041666664
...
if we change the data to:
X = np.array([[0,0],
[0,1],
[1,0],
[1,1]])
y = np.array([[0],
[1],
[1],
[0]])
wh = 2 * np.random.random((2,4)) - 1
wo = 2 * np.random.random((4,1)) - 1
It works the way it should. I don't get why is this happening please someone enlighten me Thank you.
A few observations about your neural network:
Since your loss is going absolutely nowhere I suspect that the lack of bias is your problem. Your initialization is probably putting somewhere in the flat part of the sigmoid, and with no bias there is no way out.
If adding a bias doesn't fix your problem, try using logistic loss instead of linear loss, try using a smaller learning rate, and try using ReLU activation functions in your hidden layer.
UPDATE:
I took a closer look and found a major bug: Your weight matrices have the wrong shapes. Here are the correct shapes:
n_hidden_units = 16
## Initialize weights with random values
wh = np.random.uniform(size=(X.shape[1], n_hidden_units))
wo = np.random.uniform(size=(n_hidden_units, 1))
This change alone didn't fix the problem, I also had to experiment with the learning rate. I found 0.01
to work well. This value was sensitive to the number of hidden units I used.
I also got slightly better performance by adding a bias term. You can do this with:
X = np.c_[X, np.ones(768)]