This is a follow-up to a couple of questions I asked before...I want to fine-tune the I3D model for action recognition from Pytorch hub (which is pre-trained on Kinetics 400 classes) on a custom dataset, where I have 4 possible output classes.
I'm loading the model and modifying the last layer by:
model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)
num_classes = 4
model.blocks[6].proj = torch.nn.Linear(2048, num_classes)
I defined the getitem method of my Dataset to return:
def __getitem__(self, ind):
return processed_images, target
where processed_images and target are Tensors, with shapes:
torch.Size([5, 224, 224, 3])
Basically, processed_images is a sequence of 5 RGB images, each with shape (224, 224), while target is the one-hot encoding for the target classes.
In the training part, I have:
train_dataloader =
for epoch in range(number_of_epochs):
for batch_ind, batch_data in enumerate(train_dataloader):
# Extract data and label
datas, labels = batch_data
# move to device
datas_ =
labels_ =
weights_ =
# permute axes (changing from [22, 5, 224, 224, 3] -> [22, 3, 5, 224, 224, 3]
datas_ = datas_.permute(0, 4, 1, 2, 3)
preds_ = model(datas_)
But I'm getting an error in the forward method of ResNetBasicHead:
Exception has occurred: RuntimeError
input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)
File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/", line 374, in forward
x = self.pool(x)
File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/", line 43, in forward
x = block(x)
File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/", line 271, in train
preds_ = model(datas_)
File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/", line 571, in train_roi
train(training_parameters, train_from_existing_path=None, perform_tests=perform_tests, config=config)
File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/", line 13, in <module>
train_roi(config=config, perform_tests=False)
RuntimeError: input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)
Any idea how to solve this?
The issue was simply that I was using too few images, the model has been trained on sequences of 9 frames, so any input containing fewer images than that won't work.