Search code examples
pytorchconv-neural-networkspatial-pooling

How to use pooling on rows, i.e: from (512, 20, 32) to (512, 10, 32)?


I have data with the shape: (512, 20, 32) and I want to use AvgPool and get the shape of: (512, 10, 32).

I have tried with no success:

pool = nn.AvgPool1d(kernel_size=2, stride=2)
data = torch.rand(512, 20, 32)
out  = pool(data)
print(out.shape)

output:

torch.size([512, 20, 16])

How can I run pool on the horizontal data ?


Solution

  • In pytorch, the dimensions of tensors are batch-channel-length. The pooling layer operates on the length dimension (the last one).

    There are two workarounds I can think of:

    1. Permuting the dimensions

    You can change the order of dimension prior to pooling, using transpose:

    pool = nn.AvgPool1d(kernel_size=2, stride=2)
    data = torch.rand(512, 20, 32)
    data_permuted = data.transpose(1, 2)
    out_permuted  = pool(data_permuted)
    out = out_permuted.transpose(1, 2) 
    
    print(out.shape)
    

    Output:

    torch.Size([512, 10, 32])
    

    2. Adding a singleton channel dimension and using 2D pooling

    Adding an additional "empty" channel dimension will turn your 3D input tensor into a 4D one, allowing nn.AvgPool2d to operate on the height dimension. This adding and removing of singleton dimensions can be done using unsqueeze and squeeze:

    pool = nn.AvgPool2d(kernel_size=(2, 1), stride=(2, 1))  # pool only along height dimension
    data = torch.rand(512, 20, 32)
    out = pool(data.unsqueeze(dim=1)).squeeze(dim=1)
    
    print(out.shape)
    

    Output:

    torch.Size([512, 10, 32])
    

    I believe the second option is more intuitive and might be more efficient than the first one.