Pytorch I3D Resnet model on a custom dataset

This is a follow-up to a couple of questions I asked before...I want to fine-tune the I3D model for action recognition from Pytorch hub (which is pre-trained on Kinetics 400 classes) on a custom dataset, where I have 4 possible output classes.

I'm loading the model and modifying the last layer by:

model = torch.hub.load("facebookresearch/pytorchvideo", "i3d_r50", pretrained=True)
num_classes = 4
model.blocks[6].proj = torch.nn.Linear(2048, num_classes)

I defined the getitem method of my Dataset to return:

def __getitem__(self, ind):
    [...]
    return processed_images, target

where processed_images and target are Tensors, with shapes:

>>processed_images.shape
torch.Size([5, 224, 224, 3])

>>target.shape
torch.Size([4])

Basically, processed_images is a sequence of 5 RGB images, each with shape (224, 224), while target is the one-hot encoding for the target classes.

In the training part, I have:

model.train()
model.to(device)
train_dataloader = torch.utils.data.DataLoader(
        train_dataset,
        batch_size=batch_size,
        shuffle=True,
        drop_last=False,
        persistent_workers=False,
        timeout=0,
    )

for epoch in range(number_of_epochs):
    for batch_ind, batch_data in enumerate(train_dataloader):
        # Extract data and label
        datas, labels = batch_data

        # move to device
        datas_ = datas.to(device)
        labels_ = labels.to(device)
        weights_ = weights.to(device)

        # permute axes (changing from [22, 5, 224, 224, 3] -> [22, 3, 5, 224, 224, 3]  
        datas_ = datas_.permute(0, 4, 1, 2, 3)
        preds_ = model(datas_)

But I'm getting an error in the forward method of ResNetBasicHead:

Exception has occurred: RuntimeError
input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)
  File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/head.py", line 374, in forward
    x = self.pool(x)
  File "/home/c.demasi/.cache/torch/hub/facebookresearch_pytorchvideo_main/pytorchvideo/models/net.py", line 43, in forward
    x = block(x)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/train_torch.py", line 271, in train
    preds_ = model(datas_)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/src/train_torch.py", line 571, in train_roi
    train(training_parameters, train_from_existing_path=None, perform_tests=perform_tests, config=config)
  File "/home/c.demasi/work/projects/ball_shot_action_detection_dev_environment/train.py", line 13, in <module>
    train_roi(config=config, perform_tests=False)
RuntimeError: input image (T: 2 H: 14 W: 14) smaller than kernel size (kT: 4 kH: 7 kW: 7)

Any idea how to solve this?

Solution

The issue was simply that I was using too few images, the model has been trained on sequences of 9 frames, so any input containing fewer images than that won't work.