Search code examples
javaneural-networkxor

Java XOR Neural Network not training properly


I have a neural network with 2 inputs, 2 hidden neurons and 1 output neuron to solve the xor problem. I randomly initialise the weights between 0 and 1, I use a learning rate of 0.1 with sigmoid activation function.

When I train only one option, for example 1 and 0 with a target of 1, it works fine and gives an appropriate guess. However, when I try and train all the possible inputs together, the output converges around 0.5-0.6.

I have tried changing the learning rate, the range that the weights are randomly initialised and the number of times the network is trained, however it makes no difference to the final output.

Here is a link to my code on GitHub.

Any ideas on how I could fix this issue?


Solution

  • I suspect that the backpropagation isn't implemented properly. An overview is given in e.g. http://users.pja.edu.pl/~msyd/wyk-nai/multiLayerNN-en.pdf in particular pages 17 to 20.

    The tuneWeigths- and the delta_weights-method of the Output_Neuron-class are implemented properly. However, in this step the array weightDeltaHidden (see comment in the code) must be determined that will be needed later when the weights of the Hidden_Neuron-class are tuned.

    The tuneWeigths- and the delta_weights-method of the Hidden_Neuron-class don't seem to be implemented properly. Here, among other things, the previously determined array weightDeltaHidden must be used.

    In the code below I've made the necessary changes without essentially changing the design of the code. But maybe a refactoring makes sense.

    Changes in the Output_Neuron-class:

    ...
    
    private double[] weightedDeltaHidden;
    
    ...
    
    Output_Neuron(int hiddenNeurons) {
    
        ...
    
        this.weightedDeltaHidden = new double[hiddenNeurons];
    }
    
    ...
    
    void tuneWeights(double LR, double[] hidden_output, int target) {
        double delta = (target - output) * f.dSigmoid(output);
        for (int i = 0; i < weights.length; i++) {
            weights[i] += delta_weights(i, LR, delta, hidden_output);
        }
    }
    
    double delta_weights(int i, double LR, double delta, double[] hidden_output) {
        weightedDeltaHidden[i] = delta * weights[i]; // weightedDeltaHidden is the product of delta of this output neuron and the weight of the i-th hidden neuron.
                                                     // That value is needed when the weights of the hidden neurons are tuned...
        return LR * delta * hidden_output[i];
    }
    
    ...
    
    double[] getWeightedDeltaHidden() {
        return weightedDeltaHidden;
    }
    

    Changes in Hidden_Neuron-class:

    ...
    
    void tuneWeights(double LR, int[] inputs, double weightedDeltaHiddenTotal) {
        for (int i = 0; i < weights.length; i++) {
            weights[i] += delta_weights(LR, inputs[i], weightedDeltaHiddenTotal);
        }
    }
    
    private double delta_weights(double LR, double input, double weightedDeltaHiddenTotal) {
        double deltaOutput = f.dSigmoid(output) * weightedDeltaHiddenTotal;
        return LR * deltaOutput * input;
    }
    
    ...
    

    Changes in the Network-class inside the train-method where the tuning of the hidden weights takes place:

    void train(int[] inputs, int target) {
    
        ...
    
        //tune Hidden weights
        for (int i = 0; i < numOfHiddenNeurons; i++) {
            double weightedDeltaHiddenTotal = 0;
            for (int j = 0; j < numOfOutputNeurons; j++) {
                weightedDeltaHiddenTotal += output_neurons[j].getWeightedDeltaHidden()[i]; // weightedDeltaHiddenTotal is the sum of the weightedDeltaHidden over all output neurons. Each weightedDeltaHidden
            }                                                                              // is the product of delta of the j-th output neuron and the weight of the i-th hidden neuron.
            hidden_neurons[i].tuneWeights(LR, inputs, weightedDeltaHiddenTotal);
        }
    }
    

    With those changes, a typical output for 1_000_000 train-calls (2 hidden neurons) is

    Error: 1.9212e-01 in cycle 0
    Error: 8.9284e-03 in cycle 100000
    Error: 1.5049e-03 in cycle 200000
    Error: 4.7214e-03 in cycle 300000
    Error: 4.4727e-03 in cycle 400000
    Error: 2.1179e-03 in cycle 500000
    Error: 2.9165e-04 in cycle 600000
    Error: 2.0655e-03 in cycle 700000
    Error: 1.5381e-03 in cycle 800000
    Error: 1.0440e-03 in cycle 900000
    0 0: 0.0170
    1 0: 0.9616
    0 1: 0.9612
    1 1: 0.0597
    

    and for 100_000_000 train-calls (2 hidden neurons)

    Error: 2.4755e-01 in cycle 0
    Error: 2.7771e-04 in cycle 5000000
    Error: 6.8378e-06 in cycle 10000000
    Error: 5.4317e-05 in cycle 15000000
    Error: 6.8956e-05 in cycle 20000000
    Error: 2.1072e-06 in cycle 25000000
    Error: 2.6281e-05 in cycle 30000000
    Error: 2.1630e-05 in cycle 35000000
    Error: 1.1546e-06 in cycle 40000000
    Error: 1.7690e-05 in cycle 45000000
    Error: 8.6837e-07 in cycle 50000000
    Error: 1.3603e-05 in cycle 55000000
    Error: 1.2905e-05 in cycle 60000000
    Error: 2.1657e-05 in cycle 65000000
    Error: 1.1594e-05 in cycle 70000000
    Error: 1.9191e-05 in cycle 75000000
    Error: 1.7273e-05 in cycle 80000000
    Error: 9.1364e-06 in cycle 85000000
    Error: 1.5221e-05 in cycle 90000000
    Error: 1.4501e-05 in cycle 95000000
    0 0: 0.0008
    1 0: 0.9961
    0 1: 0.9961
    1 1: 0.0053
    

    An increase of the hidden neurons increases the performance. Below a typical output for 1_000_000 train-calls (4 hidden neurons) is shown:

    Error: 1.2617e-02 in cycle 0
    Error: 7.9950e-04 in cycle 100000
    Error: 4.2567e-04 in cycle 200000
    Error: 1.7279e-04 in cycle 300000
    Error: 1.2246e-04 in cycle 400000
    Error: 1.0456e-04 in cycle 500000
    Error: 6.9140e-05 in cycle 600000
    Error: 6.8698e-05 in cycle 700000
    Error: 5.1640e-05 in cycle 800000
    Error: 4.4534e-05 in cycle 900000
    0 0: 0.0092
    1 0: 0.9905
    0 1: 0.9912
    1 1: 0.0089