Backpropagation formula seems to be unimplementable as is

I've been working on getting some proficiency on backpropagation, and have run across the standard mathematical formula for doing this. I implemented a solution which seemed to work properly (and passed the relevant test with flying colours).

However ... the actual solution (implemented in MATLAB, and using vectorization) is at odds with the formula in two important respects.

The formula looks like this:

delta-layer2 = (Theta-layer2 transpose) x delta-layer3 dot x gprime(-- not important right now)

The working code looks like this:

% d3 is delta3, d2 is delta2, Theta2 is minus the bias column
% dimensions: d3--[5000x10], d2--[5000x25], Theta2--[10x25]

d3 = (a3 - y2);
d2 = (d3 * Theta2) .* gPrime(z2);

I can't reconcile what I implemented with the mathematical formula, on two counts:

The working code reverses the terms in the first part of the expression;
The working code does not transpose Theta-layer2, but the formula does.

How can this be? The dimensions of the individual matrices don't seem to allow for any other working combination.

Josh

Solution

This isn't a wrong question, I don't not why those downvotes; the implementation of a backpropagation algorithm is not intuitive as it appears. I'm not so great in math and I've never used MATLAB ( usually c ), so I avoided to answer this question first, but it deserve it.

First of all we have to do some simplifications.

1° we will use only a in_Data set so: vector in_Data[N] ( in the case below N = 2 ) ( If we succeed whit only a pat is not difficult extend it in a matrix ).

2° we will use this structure: 2 I, 2 H, 2 O ( I we succeed whit this; we will succeed with all ) this Network ( that I've stolen from: this blog )

Let's start: we know that for update the weights:

note: M=num_pattern, but we have previous declare in_data as vector, so you can delete the sum in the formula above and the matrix in the formula below. So this is your new formula:

we will study 2 connections: w1 and w5. Let's write the derivative:

let's code them: ( I really don't know MATLAB so I'll write a pseudocode )

vector d[num_connections+num_output_neurons]  // num derivatives = num connections whitout count bias there are 8 connections. ;  +2 derivative of O)
vector z[num_neurons]     // z is the output of each neuron.
vector w[num_connections] // Yes a Vector! we have previous removed matrix and the sum.

// O layer
d[10] = (a[O1] - y[O1]);   // Start from last to calculate the error. 
d[9] = (a[O2] - y[O2]);  

// H -> O layer
for i=5; i<=8; i++ ( Hidden to Out layer connections){
    d[i] = (d)*g_prime(z[i])
}

// I -> H layer

 for i=1; i<=8 i++ (Input to Hidden layer connections){

     for j=1; i<=num_connection_from_neuron i++ (Take for example d[1] it depends on how many connections have H1 versus Outputs){
     d[i] = d1 + (d[j+4]*w[j+4] )

     }
  d[i] = d[i]*g_prime(z[i]);
}

If you need to extend it in a Matrix write it in a comment that I'll extend the code.

And so you have found all the derivatives. Maybe this is not what you are exactly searching. I'm not even sure that all what I wrote is correct (I hope it is) I will try to code backpropagation in these days so I will able to correct errors if there are. I hope this will be a little helpful; better than nothing.

Best regards, Marco.