I am currently constructing a model in Pytorch that requires multiple custom layers. I have only been defining the forward method and thus do not define a backward method. The model seems to run well, and the optimizer is able to update using the gradients from the layers. However, I see many people defining backward methods, and I wonder if I am missing something.
Why might you need to define a backwards pass?
In very few cases should you be implementing your own backward function in PyTorch. This is because PyTorch's autograd functionality takes care of computing gradients for the vast majority of operations.
The most obvious exceptions are
You have a function that cannot be expressed as a finite combination of other differentiable functions (for example, if you needed the incomplete gamma function, you might want to write your own forward and backward which used numpy and/or lookup tables).
You're looking to speed up the computation of a particularly complicated expression for which the gradient could be drastically simplified after applying the chain rule.