machine-learning neural-network gradient-descent

Gradient descent : should delta value be scalar or vector?

When computing the delta values for a neural network after running back propagation :

the value of delta(1) will be a scalar value, it should be a vector ?

Update :

Taken from http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

Specifically :

Solution

First, you probably understand that in each layer, we have n x m parameters (or weights) that needs to be learned so it forms a 2-d matrix.

n is the number of nodes in the current layer plus 1 (for bias)
m is the number of nodes in the previous layer.

We have n x m parameters because there is one connection between any of the two nodes between the previous and the current layer.

I am pretty sure that Delta (big delta) at layer L is used to accumulate partial derivative terms for every parameter at layer L. So you have a 2D matrix of Delta at each layer as well. To update the i-th row (the i-th node in the current layer) and j-th column (the j-th node in the previous layer) of the matrix,

D_(i,j) = D_(i,j) + a_j * delta_i
note a_j is the activation from the j-th node in previous layer,
     delta_i is the error of the i-th node of the current layer
so we accumulate the error proportional to their activation weight.

Thus to answer your question, Delta should be a matrix.