I am training a deep learning multi-target tracking model on video sequence.
The video frames are extracted and annotated at 1fps
.
To utilize smoother temporal coherence, I have extracted the intermediate 24 frames between every 2
annotated frames. Now, I have all the frames extracted at 25fps
but the ground truth labels are available only at the interval of 25
frames initially annotated.
I want to train a deep learning model by providing all the smooth 25fps
frames during forward pass, but during backprops, I want to calculate and optimize the loss only for the annotated 1fps
frames.
Any hint on how I should go about this? Especially when my mini-batch size
is less than 25
.
One useful thing I am doing so far, is to have -1 label for the un-annotated frames and skip them when computing loss. This may be sub-optimal but works, anyone with a better idea?