python pytorch naming-conventions convolution cross-correlation

Why are PyTorch "convolutions" implemented as cross-correlations?

PyTorch convolutions are actually implemented as cross-correlations. This shouldn't produce issues in training a convolution layer, since one is just a flipped version of the other (and hence the learned function will be equally powerful), but it does prove an issue when:

trying to implement an actual convolution with the functional library
trying to copy the weights of an actual convolution from another deep learning library

The authors say the following in Deep Learning with PyTorch:

Convolution, or more precisely, discrete convolution¹...

^{^1. There is a subtle difference between PyTorch's convolution and mathematics' convolution: one argument's sign is flipped. If we were in a pedantic mood, we could call PyTorch's convolutions discrete cross-correlations.}

But they don't explain why it was implemented like this. Is there a reason?

Maybe something similar to how the PyTorch implementation of CrossEntropyLoss isn't actually cross entropy but an analogous function taking "logits" as inputs instead of raw probabilities (to avoid numerical instability)?

Solution

I think the reason is simpler. As you said, convolution is the flipped version of cross-correlation, but that's not a problem in the context of training a CNN. So we can just avoid doing the flipping, which simplifies the code and reduces the computation time:

The advantage of cross-correlation is that it avoids the additional step of flipping the filters to perform the convolutions.

Performance Evaluation of cuDNN Convolution Algorithms on NVIDIA Volta GPUs

Flipping the kernel won't have any effect on the mathematical stability. The operations remain the same.