pytorch conv-neural-network spatial-pooling

How to use pooling on rows, i.e: from (512, 20, 32) to (512, 10, 32)?

I have data with the shape: (512, 20, 32) and I want to use AvgPool and get the shape of: (512, 10, 32).

I have tried with no success:

pool = nn.AvgPool1d(kernel_size=2, stride=2)
data = torch.rand(512, 20, 32)
out  = pool(data)
print(out.shape)

output:

torch.size([512, 20, 16])

How can I run pool on the horizontal data ?

Solution

In pytorch, the dimensions of tensors are batch-channel-length. The pooling layer operates on the length dimension (the last one).

There are two workarounds I can think of:

1. Permuting the dimensions

You can change the order of dimension prior to pooling, using transpose:

pool = nn.AvgPool1d(kernel_size=2, stride=2)
data = torch.rand(512, 20, 32)
data_permuted = data.transpose(1, 2)
out_permuted  = pool(data_permuted)
out = out_permuted.transpose(1, 2) 

print(out.shape)

Output:

torch.Size([512, 10, 32])

2. Adding a singleton channel dimension and using 2D pooling

Adding an additional "empty" channel dimension will turn your 3D input tensor into a 4D one, allowing nn.AvgPool2d to operate on the height dimension. This adding and removing of singleton dimensions can be done using unsqueeze and squeeze:

pool = nn.AvgPool2d(kernel_size=(2, 1), stride=(2, 1))  # pool only along height dimension
data = torch.rand(512, 20, 32)
out = pool(data.unsqueeze(dim=1)).squeeze(dim=1)

print(out.shape)

Output:

torch.Size([512, 10, 32])

I believe the second option is more intuitive and might be more efficient than the first one.