Search code examples
machine-learningneural-networkgradient-descent

Gradient descent : should delta value be scalar or vector?


When computing the delta values for a neural network after running back propagation :

enter image description here

the value of delta(1) will be a scalar value, it should be a vector ?

Update :

Taken from http://www.holehouse.org/mlclass/09_Neural_Networks_Learning.html

Specifically : enter image description here


Solution

  • First, you probably understand that in each layer, we have n x m parameters (or weights) that needs to be learned so it forms a 2-d matrix.

    n is the number of nodes in the current layer plus 1 (for bias)
    m is the number of nodes in the previous layer.
    

    We have n x m parameters because there is one connection between any of the two nodes between the previous and the current layer.

    I am pretty sure that Delta (big delta) at layer L is used to accumulate partial derivative terms for every parameter at layer L. So you have a 2D matrix of Delta at each layer as well. To update the i-th row (the i-th node in the current layer) and j-th column (the j-th node in the previous layer) of the matrix,

    D_(i,j) = D_(i,j) + a_j * delta_i
    note a_j is the activation from the j-th node in previous layer,
         delta_i is the error of the i-th node of the current layer
    so we accumulate the error proportional to their activation weight.
    

    Thus to answer your question, Delta should be a matrix.