Search code examples
pythonoptimizationdeep-learningpytorchconvolution

How to speed up convolution with kernels that contain many zeros?


Suppose there are two set of kernels with a same shape and almost half elements in the kernels are zeros. For the same input x, the kernels are used to calculate convolution respectively. See pytorch example code blow:

initial
conv=nn.conv2d(c,n,k)  # shape(n,c,k,k)

forward 
kernel1=F.relu(conv.weight) 
kernel2=F.relu(-conv.weight) 
out1=F.conv2d(x,kernel1)
out2=F.conv2d(x,kernel2)

As 0 multiple anything is 0, so near half of the multiplications in the convolutions are unnecessary. My question is: Is there possible to speed up the two convolutions? Can the two convolutions calculate in parallel?


Solution

  • If some entire kernels are all zeroes, we can just remove the corresponding input channels from the next subsequent layer in the network. This operation is equivelant to just removing entire rows/columns from the unrolled patches and convolutional weights that are then used as a matrix multiplication on the GPU. The main point is thatb this operation is still a dense matrix multiplication, but now with fewer rows/columns.

    However, if the weights are sparse, i.e. individual entries are zeroed out with no apparent structure, there is no easy way to translate this to a performance improvement on standard hardware. GPUs are good at dense matrix multiplication, not sparse matrix multiplication, which has to instead keep track of the zeroes and then ignore these multiplications.

    Maybe under extreme levels of sparsity (> 90%) you can translate this theoretical reduction in multiplications to a reduced memory footprint or latency on GPUs, but this is non-trivial.