Search code examples
javaartificial-intelligenceneural-networkbackpropagation

Back propagation algorithm - error derivative calculation


When calculating the error derivative the following works which I am using but not sure exactly why.

double errorDerivative = (-output * (1-output) *(desiredOutput - output));

When I remove the minus from the first output, it then fails and reaches the maximum epoch limit. I'm assuming this is how it should look like from looking at this example here http://homepages.gold.ac.uk/nikolaev/311imlti.htm that doesn't use a minus operator.

double errorDerivative2 = (output * (1-output) *(desiredOutput - output));

I'm currently looking over an modifying existing BackPropagation implementation that uses stochastic gradient descent and want to just make it use the standard back propagation alorithm . Currently, it looks like this.

public void applyBackpropagation(double expectedOutput[]) {

        // error check, normalize value ]0;1[
        /*for (int i = 0; i < expectedOutput.length; i++) {
            double d = expectedOutput[i];
            if (d < 0 || d > 1) {
                if (d < 0)
                    expectedOutput[i] = 0 + epsilon;
                else
                    expectedOutput[i] = 1 - epsilon;
            }
        }*/

        int i = 0;
        for (Neuron n : outputLayer) {
            System.out.println("neuron");
            ArrayList<Connection> connections = n.getAllInConnections();
            for (Connection con : connections) {
                double output = n.getOutput();
                System.out.println("final output is "+output);
                double ai = con.leftNeuron.getOutput();
                System.out.println("ai output is "+ai);
                double desiredOutput = expectedOutput[i];

                double errorDerivative = (-output * (1-output) *(desiredOutput - output));
                double errorDerivative2 = (output * (1-output) *(desiredOutput - output));
                System.out.println("errorDerivative is "+errorDerivative);
                System.out.println("errorDerivative my one is "+(output * (1-output) *(desiredOutput - output)));
                double deltaWeight = -learningRate * errorDerivative2;
                double newWeight = con.getWeight() + deltaWeight;
                con.setDeltaWeight(deltaWeight);
                con.setWeight(newWeight + momentum * con.getPrevDeltaWeight());
            }
            i++;
        }

        // update weights for the hidden layer
        for (Neuron n : hiddenLayer) {
            ArrayList<Connection> connections = n.getAllInConnections();
            for (Connection con : connections) {
                double output = n.getOutput();
                double ai = con.leftNeuron.getOutput();
                double sumKoutputs = 0;
                int j = 0;
                for (Neuron out_neu : outputLayer) {
                    double wjk = out_neu.getConnection(n.id).getWeight();
                    double desiredOutput = (double) expectedOutput[j];
                    double ak = out_neu.getOutput();
                    j++;
                    sumKoutputs = sumKoutputs
                            + (-(desiredOutput - ak) * ak * (1 - ak) * wjk);
                }

                double partialDerivative = output * (1 - output) * ai * sumKoutputs;
                double deltaWeight = -learningRate * partialDerivative;
                double newWeight = con.getWeight() + deltaWeight;
                con.setDeltaWeight(deltaWeight);
                con.setWeight(newWeight + momentum * con.getPrevDeltaWeight());
            }
        }
    }

Solution

  • Sorry, i won't review your code - no time for that, you will have to come back with more specific questions and then i can help you.

    The reason errorDerivative2 works is probably that you are using a weight update rule such as
    deltaW = learningRate*errorDerivative2*input

    Normaly what you refer to as 'errorDerivative2' is known as delta and is defined as
    -output * (1-output) *(desiredOutput - output)
    for a neuron with a sigmoid transfer function

    with the weight update rule
    deltaW = -learningRate*delta*input

    So basicly it works for you without a minus sign on errorDerivative2 since you have left out a minus sign in another place aswell..