Search code examples
pythonnumpymachine-learningneural-networkbackpropagation

Neural Network seems to be getting stuck on a single output with each execution


I've created a neural network to estimate the sin(x) function for an input x. The network has 21 output neurons (representing numbers -1.0, -0.9, ..., 0.9, 1.0) with numpy that does not learn, as I think I implemented the neuron architecture incorrectly when I defined the feedforward mechanism.

When I execute the code, the amount of test data it estimates correctly sits around 48/1000. This happens to be the average data point count per category if you split 1000 test data points between 21 categories. Looking at the network output, you can see that the network seems to just start picking a single output value for every input. For example, it may pick -0.5 as the estimate for y regardless of the x you give it. Where did I go wrong here? This is my first network. Thank you!

import random
import numpy as np
import math
class Network(object):

def __init__(self,inputLayerSize,hiddenLayerSize,outputLayerSize):

    #Create weight vector arrays to represent each layer size and initialize indices randomly on a Gaussian distribution.
    self.layer1 = np.random.randn(hiddenLayerSize,inputLayerSize)
    self.layer1_activations = np.zeros((hiddenLayerSize, 1))
    self.layer2 = np.random.randn(outputLayerSize,hiddenLayerSize)
    self.layer2_activations = np.zeros((outputLayerSize, 1))

    self.outputLayerSize = outputLayerSize
    self.inputLayerSize = inputLayerSize
    self.hiddenLayerSize = hiddenLayerSize

    # print(self.layer1)
    # print()
    # print(self.layer2)

    # self.weights = [np.random.randn(y,x)
    #                 for x, y in zip(sizes[:-1], sizes[1:])]

def feedforward(self, network_input):

    #Propogate forward through network as if doing this by hand.
    #first layer's output activations:
    for neuron in range(self.hiddenLayerSize):
        self.layer1_activations[neuron] = 1/(1+np.exp(network_input * self.layer1[neuron]))

    #second layer's output activations use layer1's activations as input:
    for neuron in range(self.outputLayerSize):
        for weight in range(self.hiddenLayerSize):
            self.layer2_activations[neuron] += self.layer1_activations[weight]*self.layer2[neuron][weight]
        self.layer2_activations[neuron] = 1/(1+np.exp(self.layer2_activations[neuron]))


    #convert layer 2 activation numbers to a single output. The neuron (weight vector) with highest activation will be output.
    outputs = [x / 10 for x in range(-int((self.outputLayerSize/2)), int((self.outputLayerSize/2))+1, 1)] #range(-10, 11, 1)

    return(outputs[np.argmax(self.layer2_activations)])

def train(self, training_pairs, epochs, minibatchsize, learn_rate):
    #apply gradient descent
    test_data = build_sinx_data(1000)
    for epoch in range(epochs):
        random.shuffle(training_pairs)
        minibatches = [training_pairs[k:k + minibatchsize] for k in range(0, len(training_pairs), minibatchsize)]
        for minibatch in minibatches:
            loss = 0 #calculate loss for each minibatch

            #Begin training
            for x, y in minibatch:
                network_output = self.feedforward(x)
                loss += (network_output - y) ** 2
                #adjust weights by abs(loss)*sigmoid(network_output)*(1-sigmoid(network_output)*learn_rate
            loss /= (2*len(minibatch))
            adjustWeights = loss*(1/(1+np.exp(-network_output)))*(1-(1/(1+np.exp(-network_output))))*learn_rate
            self.layer1 += adjustWeights
            #print(adjustWeights)
            self.layer2 += adjustWeights
            #when line 63 placed here, results did not improve during minibatch.
        print("Epoch {0}: {1}/{2} correct".format(epoch, self.evaluate(test_data), len(test_data)))
    print("Training Complete")

def evaluate(self, test_data):
    """
    Returns number of test inputs which network evaluates correctly.
    The ouput assumed to be neuron in output layer with highest activation
    :param test_data: test data set identical in form to train data set.
    :return: integer sum
    """
    correct = 0
    for x, y in test_data:
        output = self.feedforward(x)
        if output == y:
            correct+=1
    return(correct)

def build_sinx_data(data_points):
"""
Creates a list of tuples (x value, expected y value) for Sin(x) function.
:param data_points: number of desired data points
:return: list of tuples (x value, expected y value
"""
x_vals = []
y_vals = []
for i in range(data_points):
    #parameter of randint signifies range of x values to be used*10
    x_vals.append(random.randint(-2000,2000)/10)
    y_vals.append(round(math.sin(x_vals[i]),1))
return (list(zip(x_vals,y_vals)))
# training_pairs, epochs, minibatchsize, learn_rate

sinx_test = Network(1,21,21)
print(sinx_test.feedforward(10))
sinx_test.train(build_sinx_data(600),20,10,2)
print(sinx_test.feedforward(10))

Solution

  • I didn't examine thoroughly all of your code, but some issues are clearly visible:

    • * operator doesn't perform matrix multiplication in numpy, you have to use numpy.dot. This affects, for instance, these lines: network_input * self.layer1[neuron], self.layer1_activations[weight]*self.layer2[neuron][weight], etc.

    • Seems like you are solving your problem via classification (selecting 1 out of 21 classes), but using L2 loss. This is somewhat mixed up. You have two options: either stick to classification and use a cross entropy loss function, or perform regression (i.e. predict the numeric value) with L2 loss.

    • You should definitely extract sigmoid function to avoid writing the same expression all over again:

      def sigmoid(z):
        return 1 / (1 + np.exp(-z))
      
      def sigmoid_derivative(x):
        return sigmoid(x) * (1 - sigmoid(x))
      
    • You perform the same update of self.layer1 and self.layer2, which clearly wrong. Take some time analyzing how exactly backpropagation works.