Search code examples
neural-networkdeep-learningbackpropagationtorchgradient-descent

How to write the updateGradInput and accGradParameters in torch?


I know the two functions are for torch's backward propagation and the interface is as follows updateGradInput(input, gradOutput) accGradParameters(input, gradOutput, scale) I'm confused about what the gradInput and gradOutput really mean in a layer. Assume the network's cost is C and a layer L. Do gradInput and gradOutput of layer L mean d_C/d_input_L and d_C/d_output_L?

If so, how to compute gradInput accorading to gradOutput?

Moreover, does accGradParameters mean to accumulate d_C/d_Weight_L and d_C/d_bias_L? If so, how to compute these values?


Solution

  • Do gradInput and gradOutput of layer L mean d_C/d_input_L and d_C/d_output_L

    Yes:

    • gradInput = derivative of the cost w.r.t layer's input,
    • gradOutput = derivative of the cost w.r.t layer's output.

    how to compute gradInput according to gradOutput

    Adapting the schema from The building blocks of Deep Learning (warning: in this schema the cost is denoted L = Loss, and the layer f) we have:

    enter image description here

    For a concrete, step-by-step example of such a computation on a LogSoftMax layer you can refer to this answer.

    does accGradParameters mean to accumulate d_C/d_Weight_L and d_C/d_bias_L

    Yes. Named gradWeight and gradBias in torch/nn.

    how to compute these values?

    Similarly as above. Still using a formula from the above blog post:

    enter image description here

    Except the jacobian has not the same dimensionality (see the blog post for more details). As an example, for a Linear layer this translates into:

    enter image description here

    This is the outer product between the layer's input and gradOutput. In Torch we have:

    self.gradWeight:addr(scale, gradOutput, input)
    

    And:

    enter image description here

    Which is gradOutput. In Torch we have:

    self.gradBias:add(scale, gradOutput)
    

    In both cases scale is a scale factor used in practice as the learning rate.