PyTorch convolutions are actually implemented as cross-correlations. This shouldn't produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does prove an issue when:
functional
libraryThe authors say the following in Deep Learning with PyTorch:
Convolution, or more precisely, discrete convolution1...
1. There is a subtle difference between PyTorch's convolution and mathematics' convolution: one argument's sign is flipped. If we were in a pedantic mood, we could call PyTorch's convolutions discrete cross-correlations.
But they don't explain why it was implemented like this. Is there a reason?
Maybe something similar to how the PyTorch implementation of CrossEntropyLoss
isn't actually cross entropy but an analogous function taking "logits" as inputs instead of raw probabilities (to avoid numerical instability)?
I think the reason is simpler. As you said, convolution is the flipped version of cross-correlation, but that's not a problem in the context of training a CNN. So we can just avoid doing the flipping, which simplifies the code and reduces the computation time:
The advantage of cross-correlation is that it avoids the additional step of flipping the filters to perform the convolutions.
Flipping the kernel won't have any effect on the mathematical stability. The operations remain the same.