I have been trying to understand the convolution lowering operation shown in the cuDNN paper. I was able to understand most of it by reading through and mapping various parameters to the image below. However, I am unable to understand how the original input data (NCHW) was converted into the Dm matrix shown in red.
The ordering of the elements of the Dm matrix does not make sense. Can someone please explain this?
Each column of Dm
corresponds to a tile of the original image. Two examples are shown below:
There is no simple mathematical description of how to extract these tiles (authors call it "non-trivial") but some general comments in section 3.1.
A couple of notes:
Dm
and Fm
is flexible: you could permute the rows of Dm
and the columns of Fm
or vice-versa.Dm
in full, rather it lazily generates columns of Dm
as they are needed (see section 3.1 of the paper)