Search code examples
javamachine-learningneural-networkbackpropagation

Neural Network Backpropagation does not compute weights correctly


Currently, I am having problems with the Backpropagation algorithm. I am trying to implement it and use it to recognize the direction of faces (left, right, down, straight). Basically, I have N images, read the pixels and change its values(0 to 255) to values from 0.0 to 1.0. All images are 32*30. I have an input layer of 960 neurons, a hidden layer of 3 neurons and an output layer of 4 neurons. For example, the output <0.1,0.9,0.1,0.1> means that the person looks to the right. I followed the pseudy-code. However, it doesn't work right - it does not compute the correct weights and consequently it can't handle the training and test examples. Here are parts of the code:

    // main function - it runs the algorithm
     private void runBackpropagationAlgorithm() {
        for (int i = 0; i < 900; ++i) {
            for (ImageUnit iu : images) {
                double [] error = calcOutputError(iu.getRatioMatrix(), iu.getClassification());
                changeHiddenUnitsOutWeights(error);
                error = calcHiddenError(error);
                changeHiddenUnitsInWeights(error,iu.getRatioMatrix());
            }
        }
    }

  // it creates the neural network
    private void createNeuroneNetwork() {
            Random generator = new Random();
            for (int i = 0; i < inHiddenUnitsWeights.length; ++i) {
                for (int j = 0; j < hiddenUnits; ++j) {
                    inHiddenUnitsWeights[i][j] = generator.nextDouble();
                }
            }
            for (int i = 0; i < hiddenUnits; ++i) {
                for (int j = 0; j < 4; ++j) {
                    outHddenUnitsWeights[i][j] = generator.nextDouble();
                }
            }
        }
   // Calculates the error in the network. It runs through the whole network.
private double [] calcOutputError(double[][] input, double [] expectedOutput) {
        int currentEdge = 0;
        Arrays.fill(hiddenUnitNodeValue, 0.0);
        for (int i = 0; i < input.length; ++i) {
            for (int j = 0; j < input[0].length; ++j) {
                for (int k = 0; k < hiddenUnits; ++k) {
                    hiddenUnitNodeValue[k] += input[i][j] * inHiddenUnitsWeights[currentEdge][k];
                }
                ++currentEdge;
            }
        }
        double[] out = new double[4];
        for (int j = 0; j < 4; ++j) {
            for (int i = 0; i < hiddenUnits; ++i) {
                out[j] += outHddenUnitsWeights[i][j] * hiddenUnitNodeValue[i];
            }
        }
        double [] error = new double [4];
        Arrays.fill(error, 4);
        for (int i = 0; i < 4; ++i) {
            error[i] = ((expectedOutput[i] - out[i])*(1.0-out[i])*out[i]);
            //System.out.println((expectedOutput[i] - out[i]) + " " + expectedOutput[i] + " " +  out[i]);
        }
        return error;
    }

// Changes the weights of the outgoing edges of the hidden neurons
private void changeHiddenUnitsOutWeights(double [] error) {
        for (int i = 0; i < hiddenUnits; ++i) {
            for (int j = 0; j < 4; ++j) {
                outHddenUnitsWeights[i][j] += learningRate*error[j]*hiddenUnitNodeValue[i];
            }
        }
    }

// goes back to the hidden units to calculate their error.
private double [] calcHiddenError(double [] outputError) {
        double [] error = new double[hiddenUnits];
        for (int i = 0; i < hiddenUnits; ++i) {
            double currentHiddenUnitErrorSum = 0.0;
            for (int j = 0; j < 4; ++j) {
                currentHiddenUnitErrorSum += outputError[j]*outHddenUnitsWeights[i][j];
            }
            error[i] = hiddenUnitNodeValue[i] * (1.0 - hiddenUnitNodeValue[i]) * currentHiddenUnitErrorSum;
        }
        return error;
    }

// changes the weights of the incomming edges to the hidden neurons. input is the matrix of ratios
private void changeHiddenUnitsInWeights(double [] error, double[][] input) {
        int currentEdge = 0;
        for (int i = 0; i < input.length; ++i) {
            for (int j = 0; j < input[0].length; ++j) {
                for (int k = 0; k < hiddenUnits; ++k) {
                    inHiddenUnitsWeights[currentEdge][k] += learningRate*error[k]*input[i][j];
                }
                ++currentEdge;
            }
        }
    }

As the algorithm works, it computes bigger and bigger weights, which finally approach infinity (NaN values). I checked the code. Alas, I didn't manage to solve my problem. I will be firmly grateful to anyone who would try to help me.


Solution

  • Your code is missing the transfer functions. It sounds like you want the logistic function with a softmax output. You need to include the following in calcOutputError

    // Logistic transfer function for hidden layer. 
    for (int k = 0; k < hiddenUnits; ++k) {
        hiddenUnitNodeValue[k] = logistic(hiddenUnitNodeValue[k]);
    }
    

    and

    // Softmax transfer function for output layer.
    sum = 0;
    for (int j = 0; j < 4; ++j) {
        out[j] = logistic(out[j]);
        sum += out[j];
    }
    for (int j = 0; j < 4; ++j) {
        out[j] = out[j] / sum;
    }
    

    where the logistic function is

    public double logistic(double x){
        return (1/(1+(Math.exp(-x)));
    }
    

    Note that the softmax transfer function gives you outputs that sum to 1, so they can be interpreted as probabilities.

    Also, your calculation of the error gradient for the output layer is incorrect. It should simply be

    for (int i = 0; i < 4; ++i) {
        error[i] = (expectedOutput[i] - out[i]);
    }