I am doing action recognition with mediapipe keypoints. These are the shapes of some of my tensors:
torch.Size([3, 3, 75]) torch.Size([3, 6, 75]) torch.Size([3, 10, 75]) torch.Size([3, 11, 75]) torch.Size([3, 9, 75]) torch.Size([3, 4, 75]) torch.Size([3, 21, 75])
The height of each tensor varies as they refer to the number of frames for each sample.
I have decided that I want to consider 8 frames for each sample. I understand I have to do padding and truncate (for heights above 8), but somehow just doing the padding worked, or so it seems. I wish to understand how my code worked.
if height < 8:
source_pad = F.pad(tensor1, pad=(0, 0, 0, 8 - height))
else:
source_pad = F.pad(tensor1, pad=(0,0, 0, 8 - height))
Per documentation the pad
arguments are specified working backwards from last dimension, so you pad the last dimension by 1 on each side and the second-to-last dimension by 0 at the start and 8-height
at the end. 8-height
works out to be positive if height
is less than 8, 0 if height
= 8, and negative if height
is greater than 8.
In other words, resulting height
= height
+ 0 + (8-height
) = 8.