Java XOR Neural Network not training properly

I have a neural network with 2 inputs, 2 hidden neurons and 1 output neuron to solve the xor problem. I randomly initialise the weights between 0 and 1, I use a learning rate of 0.1 with sigmoid activation function.

When I train only one option, for example 1 and 0 with a target of 1, it works fine and gives an appropriate guess. However, when I try and train all the possible inputs together, the output converges around 0.5-0.6.

I have tried changing the learning rate, the range that the weights are randomly initialised and the number of times the network is trained, however it makes no difference to the final output.

Here is a link to my code on GitHub.

Any ideas on how I could fix this issue?

Solution

I suspect that the backpropagation isn't implemented properly. An overview is given in e.g. http://users.pja.edu.pl/~msyd/wyk-nai/multiLayerNN-en.pdf in particular pages 17 to 20.

The tuneWeigths- and the delta_weights-method of the Output_Neuron-class are implemented properly. However, in this step the array weightDeltaHidden (see comment in the code) must be determined that will be needed later when the weights of the Hidden_Neuron-class are tuned.

The tuneWeigths- and the delta_weights-method of the Hidden_Neuron-class don't seem to be implemented properly. Here, among other things, the previously determined array weightDeltaHidden must be used.

In the code below I've made the necessary changes without essentially changing the design of the code. But maybe a refactoring makes sense.

Changes in the Output_Neuron-class:

...

private double[] weightedDeltaHidden;

...

Output_Neuron(int hiddenNeurons) {

    ...

    this.weightedDeltaHidden = new double[hiddenNeurons];
}

...

void tuneWeights(double LR, double[] hidden_output, int target) {
    double delta = (target - output) * f.dSigmoid(output);
    for (int i = 0; i < weights.length; i++) {
        weights[i] += delta_weights(i, LR, delta, hidden_output);
    }
}

double delta_weights(int i, double LR, double delta, double[] hidden_output) {
    weightedDeltaHidden[i] = delta * weights[i]; // weightedDeltaHidden is the product of delta of this output neuron and the weight of the i-th hidden neuron.
                                                 // That value is needed when the weights of the hidden neurons are tuned...
    return LR * delta * hidden_output[i];
}

...

double[] getWeightedDeltaHidden() {
    return weightedDeltaHidden;
}

Changes in Hidden_Neuron-class:

...

void tuneWeights(double LR, int[] inputs, double weightedDeltaHiddenTotal) {
    for (int i = 0; i < weights.length; i++) {
        weights[i] += delta_weights(LR, inputs[i], weightedDeltaHiddenTotal);
    }
}

private double delta_weights(double LR, double input, double weightedDeltaHiddenTotal) {
    double deltaOutput = f.dSigmoid(output) * weightedDeltaHiddenTotal;
    return LR * deltaOutput * input;
}

...

Changes in the Network-class inside the train-method where the tuning of the hidden weights takes place:

void train(int[] inputs, int target) {

    ...

    //tune Hidden weights
    for (int i = 0; i < numOfHiddenNeurons; i++) {
        double weightedDeltaHiddenTotal = 0;
        for (int j = 0; j < numOfOutputNeurons; j++) {
            weightedDeltaHiddenTotal += output_neurons[j].getWeightedDeltaHidden()[i]; // weightedDeltaHiddenTotal is the sum of the weightedDeltaHidden over all output neurons. Each weightedDeltaHidden
        }                                                                              // is the product of delta of the j-th output neuron and the weight of the i-th hidden neuron.
        hidden_neurons[i].tuneWeights(LR, inputs, weightedDeltaHiddenTotal);
    }
}

With those changes, a typical output for 1_000_000 train-calls (2 hidden neurons) is

Error: 1.9212e-01 in cycle 0
Error: 8.9284e-03 in cycle 100000
Error: 1.5049e-03 in cycle 200000
Error: 4.7214e-03 in cycle 300000
Error: 4.4727e-03 in cycle 400000
Error: 2.1179e-03 in cycle 500000
Error: 2.9165e-04 in cycle 600000
Error: 2.0655e-03 in cycle 700000
Error: 1.5381e-03 in cycle 800000
Error: 1.0440e-03 in cycle 900000
0 0: 0.0170
1 0: 0.9616
0 1: 0.9612
1 1: 0.0597

and for 100_000_000 train-calls (2 hidden neurons)

Error: 2.4755e-01 in cycle 0
Error: 2.7771e-04 in cycle 5000000
Error: 6.8378e-06 in cycle 10000000
Error: 5.4317e-05 in cycle 15000000
Error: 6.8956e-05 in cycle 20000000
Error: 2.1072e-06 in cycle 25000000
Error: 2.6281e-05 in cycle 30000000
Error: 2.1630e-05 in cycle 35000000
Error: 1.1546e-06 in cycle 40000000
Error: 1.7690e-05 in cycle 45000000
Error: 8.6837e-07 in cycle 50000000
Error: 1.3603e-05 in cycle 55000000
Error: 1.2905e-05 in cycle 60000000
Error: 2.1657e-05 in cycle 65000000
Error: 1.1594e-05 in cycle 70000000
Error: 1.9191e-05 in cycle 75000000
Error: 1.7273e-05 in cycle 80000000
Error: 9.1364e-06 in cycle 85000000
Error: 1.5221e-05 in cycle 90000000
Error: 1.4501e-05 in cycle 95000000
0 0: 0.0008
1 0: 0.9961
0 1: 0.9961
1 1: 0.0053

An increase of the hidden neurons increases the performance. Below a typical output for 1_000_000 train-calls (4 hidden neurons) is shown:

Error: 1.2617e-02 in cycle 0
Error: 7.9950e-04 in cycle 100000
Error: 4.2567e-04 in cycle 200000
Error: 1.7279e-04 in cycle 300000
Error: 1.2246e-04 in cycle 400000
Error: 1.0456e-04 in cycle 500000
Error: 6.9140e-05 in cycle 600000
Error: 6.8698e-05 in cycle 700000
Error: 5.1640e-05 in cycle 800000
Error: 4.4534e-05 in cycle 900000
0 0: 0.0092
1 0: 0.9905
0 1: 0.9912
1 1: 0.0089