I am using a Resnet50 classification model from torchvision which by default accepts images as inputs. I want to make the model accept numpy files (.npy) as inputs. I understand the two have different dimensions as the numpy data is given as
[batch_size, depth, height, width, channels]
instead of
[batch_size, channels, depth, height, width].
Based on this answer, I can use the permute function to change the order of the dimensions. However, I can't find any solution or leads on how to do this in a torchvision model.
Let's say you have a tensor x
with the dimensions
[batch_size, depth, height, width, channels]
and you want to get a tensor y
with dimensions
[batch_size, channels, depth, height, width]
The permute()
method reorders these dimensions. You have to specify the order in which the original dimensions should be reordered to get the new ones, that is
y = x.permute([0, 4, 1, 2, 3])
If we analyze that, in the original tensor x
the dimensions were enuemrated as
[batch_size, depth, height, width, channels]
0 1 2 3 4
In the new tensor we therefore get
[batch_size, channels, depth, height, width]
0 4 1 2 3
which is what we need to pass to permute()
Alternatively you can also just use the einsum()
function, where you can just type the signatures which is much more intuitive.
y = torch.einsum('b d h w c -> b c d h w', x)