python numpy deep-learning pytorch conv-neural-network

How can I vectorize my custom pytorch conv2d operation?

I have a backwards pass of a custom conv2d layer that I think can be vectorized.

for s in range(num_samples):
    for c in range(num_channels):
        temp_input = input[s, c, :, :].unsqueeze(0).unsqueeze(0)
        temp_doutput = doutput[s, :, :, :].unsqueeze(1)
        temp_conv2d = torch.nn.functional.conv2d(temp_input, temp_doutput, stride=dilation, padding=padding, dilation=stride, groups=groups).squeeze(0).squeeze(0)
        cut_conv2d = temp_conv2d[:, :kernel_size, :kernel_size]
        grad_w[:, c, :, :] += cut_conv2d

The reason I think it can be vectorized is that if it wasn't vectorized, with an input of 50000 images of size 512x512, with the current configuration, it would have to perform billions of conv2d functions for just a single epoch. I'm guessing that's not right, but I can't think of a way to vectorise this any more.

Solution

Basically, with vectorizing, the idea is to add any of the iterables as an additional dimension of your vector (using view) to utilize broadcasting, and then reshape (view) back to your original dimensions.

input_reshaped = input.view(num_samples, 1, input.size(1), input.size(2), input.size(3))
doutput_reshaped = doutput.view(num_samples, 1, doutput.size(1), doutput.size(2), doutput.size(3))

temp_conv2d = torch.nn.functional.conv2d(input_reshaped, doutput_reshaped, stride=dilation, padding=padding, dilation=stride, groups=groups).squeeze()

cut_conv2d = temp_conv2d[:, :, :kernel_size, :kernel_size].permute(0, 2, 3, 1)
grad_w += cut_conv2d.sum(dim=0)