According Andrew Ng's notes on backpropagation (page 9), the delta values are only calculated for the hidden layers (n-1 to 2). These deltas are then accumulated and used to update the weight matrices.
However, the notes do no mention how to update the weight matrix for layer one.
The weights in the final layer are updated in the same manner that the subsequent weight layers are updated:
#Excerpt from my code at github
dW_matrix = -learning_rate * np.dot( delta, input_signals ).T
weight_matrix += dW_matrix
Where delta
is the delta calculated in the layer above.
The delta will be calculated for layers: [1, ->]
. There is no need to calculate the delta of layer 0
, because there are no further layers to propagate the delta down to. The weights are always updated (using the delta from the layer above).