Search code examples
artificial-intelligenceneural-networkbackpropagationencog

Bug in Resilient Backpropagation?


I'm struggling with implementing Resilient Propagation correctly. I already implemented the backpropagation Algorithm to train a Neural Network, and it works as expected for an XOR-Net, i.e. it takes about 600 Epochs to drop Error below 1%. Now i tried implementing Resilient Propagation (http://en.wikipedia.org/wiki/Rprop) for the same problem and for the first few Epochs Error drops quickly to 23% but then raises to 50% and stays there.
I implemented it exactly as per description in http://www.heatonresearch.com/book/introduction-neural-network-math.html, but that's a puzzling Description: it's different from the wikipedia Rprop-Page AND from the implementation in encog, which was written by the same author as the book, as far as i know. I also already tried the different implementations from the different sources, but nothing worked.

Some of the differences between the various sources:

  • using signum(currentPartialDerivative) instead of signum(currentPartialDerivative * previousPartialDerivative) for the calculation of the weight-changes
  • using the last weight change, instead of the new update value for the new weight change
  • Calculating the weight changes firstly and the new update values secondly

What is the correct way of implementing this? Subsequently the Implementation according to the book:

public ResilientPropagation() {
    initialUpdateValue = 0.01;
    deltaMaximum = 50;
    deltaMinimum = 0.000001;
    negativeEta = 0.5;
    positiveEta = 1.2;
    double zeroTolerance = 0.0000000000000001;
    signum = new Signum(zeroTolerance);

    init();
}

@Override
public double calculateWeightChange(Synapse synapse, double partialDerivative) {
    if (!synapseValues.containsKey(synapse)){
        double initialPartialDerivative = 0;
        synapseValues.put(synapse, new SynapseValues(initialUpdateValue, initialPartialDerivative));
    }

    SynapseValues values = synapseValues.get(synapse);
    double signChange = signum.value(values.lastPartialDerivative * partialDerivative);
    values.lastPartialDerivative = partialDerivative;
    double weightChange = 0;
    if (signChange > 0){
        newUpdateValue = Math.min(positiveEta * values.updateValue, deltaMaximum);
        weightChange = -1*newUpdateValue;
    } else if (signChange < 0){
        newUpdateValue = Math.max(negativeEta * values.updateValue, deltaMinimum);
        weightChange = newUpdateValue;
    } else {
        newUpdateValue = values.updateValue;
        double weightChange = 0;
    }
    values.updateValue = newUpdateValue;
    return weightChange;
}

The same Method for normal Backpropagation works fine:

@Override
public double calculateWeightChange(Synapse synapse, double partialDerivative) {
    double previousChange = previousWeightChange.get(synapse) != null ? previousWeightChange.get(synapse) : 0;
    double weightChange = learningRate * partialDerivative + momentum * previousChange;
    previousWeightChange.put(synapse, weightChange);
    return weightChange;
}

Solution

  • There are several different variants of the RPROP algorithm. Encog has been modified to support more of them since the publication of the book. The book focuses on the classic RPROP, as defined by a paper by Reidmiller. Subsequent papers made additional algorithms. This accounts for some differences between Encog's optimized RPROP algorithm and what the book describes.

    Looking at your code above, I have a few suggestions that might help. Mostly I am not sure about your final else clause. You have "double weightChange = 0", which will do nothing. I think you need to remove the double. You also need to establish some tolerance for what "zero" is. The change in gradients will rarely precisely hit zero, so I would establish some range about zero, maybe -0.00001 to +0.00001 for the else clause to fire. Then make sure you actually set weightChange to zero.

    Another issue that I recall from my own rprop implementation was that the sign of the gradient used for backpropagation was the inverse sign of the gradient used for backpropagation. You might try flipping the sign of the gradient for RPROP, this was necessary in my Encog implementation.

    This implementation of RPROP might be useful for you, it is the classic Reidmiller implemenation. It does function correctly and the error converges.

    https://github.com/encog/encog-java-core/blob/master/src/main/java/org/encog/neural/freeform/training/FreeformResilientPropagation.java

    Not sure if this will help. Without running the code, this is all that I see.