Search code examples
pythonnumpyresnettorchvisionpermute

How to use torch.Tensor.permute in torchvision model


I am using a Resnet50 classification model from torchvision which by default accepts images as inputs. I want to make the model accept numpy files (.npy) as inputs. I understand the two have different dimensions as the numpy data is given as

[batch_size, depth, height, width, channels] 

instead of

[batch_size, channels, depth, height, width]. 

Based on this answer, I can use the permute function to change the order of the dimensions. However, I can't find any solution or leads on how to do this in a torchvision model.


Solution

  • Let's say you have a tensor x with the dimensions

    [batch_size, depth, height, width, channels] 
    

    and you want to get a tensor y with dimensions

    [batch_size, channels, depth, height, width]
    

    The permute() method reorders these dimensions. You have to specify the order in which the original dimensions should be reordered to get the new ones, that is

    y = x.permute([0, 4, 1, 2, 3])
    

    If we analyze that, in the original tensor x the dimensions were enuemrated as

    [batch_size, depth, height, width, channels]
     0           1      2       3      4
    

    In the new tensor we therefore get

    [batch_size, channels, depth, height, width]
     0           4         1      2       3
    

    which is what we need to pass to permute()

    Alternatively you can also just use the einsum() function, where you can just type the signatures which is much more intuitive.

    y = torch.einsum('b d h w c -> b c d h w', x)