In "TRAINING RECURRENT NEURAL NETWORK" by Ilya Sutskever, there's the following technique for calculating derivatives with backpropagation in feed-forward neural networks.
The network has l hidden layers, l+1 weight matrices and b+1 bias vectors.
"Forward" stage:
"Backwards" stage:
Isn't there an index problem with l+1? for example, in the forward stage we calculate z_l+1 but return z_l.
(Since this is such a major paper, I guess I'm missing something)
There is no problem, some of the indices start at 0 (the variable z for instance), and some start at 1 (the variable x). Follow the algorithm as laid out more carefully, try writing it out by hand explicitly for say l=4.