python matlab machine-learning neural-network gradient-descent

Gradient Descent ANN - What is MATLAB doing that I'm not?

I'm trying to recreate a simple MLP artificial neural network in Python using gradient descent backpropagation. My goal is to try and recreate the accuracies that MATLAB's ANN is producing, but I'm not even getting close. I'm using the same parameters as MATLAB; same number of hidden nodes (20), 1000 epoch, learning rate (alpha) of 0.01, and same data (obviously), but my code makes no progress on improving results, whereas MATLAB is getting accuracies in the region of 98%.

I've attempted to debug through MATLAB to see what it's doing, but I've not had much luck. I believe MATLAB scales the input data between 0 and 1, and adds a bias to the input, both of which I've used in my Python code.

What is MATLAB doing that is producing results so much higher? Or, probably more likely, what have I done wrong in my Python code which is produce such poor results? All I can think of is poor initiation of the weights, incorrectly reading in the data, or incorrect manipulation of the data for processing, or incorrect/poor activation function (I've tried with tanh as well, same result).

My attempt is below, based on code I found online and tweaked slightly to read in my data, whereas the MATLAB script (just 11 lines of code) is below that. At the bottom is a link to the datasets I use (which I obtained through MATLAB as well):

Thanks for any help.

Main.py

import numpy as np
import Process
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.cross_validation import train_test_split
from sklearn.preprocessing import LabelBinarizer
import warnings


def sigmoid(x):
    return 1.0/(1.0 + np.exp(-x))


def sigmoid_prime(x):
    return sigmoid(x)*(1.0-sigmoid(x))


class NeuralNetwork:

    def __init__(self, layers):

        self.activation = sigmoid
        self.activation_prime = sigmoid_prime

        # Set weights
        self.weights = []
        # layers = [2,2,1]
        # range of weight values (-1,1)
        # input and hidden layers - random((2+1, 2+1)) : 3 x 3
        for i in range(1, len(layers) - 1):
            r = 2*np.random.random((layers[i-1] + 1, layers[i] + 1)) - 1
            self.weights.append(r)
        # output layer - random((2+1, 1)) : 3 x 1
        r = 2*np.random.random((layers[i] + 1, layers[i+1])) - 1
        self.weights.append(r)

    def fit(self, X, y, learning_rate, epochs):
        # Add column of ones to X
        # This is to add the bias unit to the input layer
        ones = np.atleast_2d(np.ones(X.shape[0]))
        X = np.concatenate((ones.T, X), axis=1)

        for k in range(epochs):

            i = np.random.randint(X.shape[0])
            a = [X[i]]

            for l in range(len(self.weights)):
                    dot_value = np.dot(a[l], self.weights[l])
                    activation = self.activation(dot_value)
                    a.append(activation)
            # output layer
            error = y[i] - a[-1]
            deltas = [error * self.activation_prime(a[-1])]

            # we need to begin at the second to last layer
            # (a layer before the output layer)
            for l in range(len(a) - 2, 0, -1):
                deltas.append(deltas[-1].dot(self.weights[l].T)*self.activation_prime(a[l]))

            # reverse
            # [level3(output)->level2(hidden)]  => [level2(hidden)->level3(output)]
            deltas.reverse()

            # backpropagation
            # 1. Multiply its output delta and input activation
            #    to get the gradient of the weight.
            # 2. Subtract a ratio (percentage) of the gradient from the weight.
            for i in range(len(self.weights)):
                layer = np.atleast_2d(a[i])
                delta = np.atleast_2d(deltas[i])
                self.weights[i] += learning_rate * layer.T.dot(delta)

    def predict(self, x):
        a = np.concatenate((np.ones(1).T, np.array(x)))
        for l in range(0, len(self.weights)):
            a = self.activation(np.dot(a, self.weights[l]))
        return a

# Create neural net, 13 inputs, 20 hidden nodes, 3 outputs
nn = NeuralNetwork([13, 20, 3])
data = Process.readdata('wine')
# Split data out into input and output
X = data[0]
y = data[1]
# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()

# Split data into training and test sets (15% testing)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15)

# Create binay output form
y_ = LabelBinarizer().fit_transform(y_train)

# Train data
lrate = 0.01
epoch = 1000
nn.fit(X_train, y_, lrate, epoch)

# Test data
err = []
for e in X_test:
    # Create array of output data (argmax to get classification)
    err.append(np.argmax(nn.predict(e)))

# Hide warnings. UndefinedMetricWarning thrown when confusion matrix returns 0 in any one of the classifiers.
warnings.filterwarnings('ignore')
# Produce confusion matrix and classification report
print(confusion_matrix(y_test, err))
print(classification_report(y_test, err))

# Plot actual and predicted data
plt.figure(figsize=(10, 8))
target, = plt.plot(y_test, color='b', linestyle='-', lw=1, label='Target')
estimated, = plt.plot(err, color='r', linestyle='--', lw=3, label='Estimated')
plt.legend(handles=[target, estimated])
plt.xlabel('# Samples')
plt.ylabel('Classification Value')
plt.grid()
plt.show()

Process.py

import csv
import numpy as np


# Add constant column of 1's
def addones(arrayvar):
    return np.hstack((np.ones((arrayvar.shape[0], 1)), arrayvar))


def readdata(loc):
    # Open file and calculate the number of columns and the number of rows. The number of rows has a +1 as the 'next'
    # operator in num_cols has already pasted over the first row.
    with open(loc + '.input.csv') as f:
        file = csv.reader(f, delimiter=',', skipinitialspace=True)
        num_cols = len(next(file))
        num_rows = len(list(file))+1

    # Create a zero'd array based on the number of column and rows previously found.
    x = np.zeros((num_rows, num_cols))
    y = np.zeros(num_rows)

    # INPUT #
    # Loop through the input file and put each row into a new row of 'samples'
    with open(loc + '.input.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            x[count] = row
            count += 1

    # OUTPUT #
    # Do the same and loop through the output file.
    with open(loc + '.output.csv', newline='') as csvfile:
        file = csv.reader(csvfile, delimiter=',')
        count = 0
        for row in file:
            y[count] = row[0]
            count += 1

    # Set data type
    x = np.array(x).astype(np.float)
    y = np.array(y).astype(np.int)

    return x, y

MATLAB script

%% LOAD DATA 
[x1,t1] = wine_dataset;

%% SET UP NN 
net = patternnet(20); 
net.trainFcn = 'traingd'; 
net.layers{2}.transferFcn = 'logsig'; 
net.derivFcn = 'logsig';

%% TRAIN AND TEST
[net,tr] = train(net,x1,t1);

Data files can be downloaded here: input output

Solution

I believe I've found the problem. This was a combination of the dataset itself (this problem didn't occur with all data sets) and the way in which I scaled the data. My original scaling method, which processed results between 0 and 1, was not helping the situation, and caused the poor results seen:

# Normalise input data between 0 and 1.
X -= X.min()
X /= X.max()

I've found another scaling method, provided by the sklearn preprocessing package:

from sklearn import preprocessing
X = preprocessing.scale(X)

This scaling method is not between 0 and 1, and I have further investigation to determine why it has helped so much, but results are now coming back with an accuracy of 96 to 100%. Very on-par with the MATLAB results, which I figure is using a similar (or same) preprocessing scaling method.

As I said above, this isn't the case with all datasets. Using the built in sklearn iris or digit datasets seemed to produce good results without scaling.