Search code examples
pythonlstmvideo-processingpose-estimationmediapipe

How to set input to the LSTM ( pose recognition through videos), if my videos are with variable frames?


I have 2 poses to classify. For each pose i have 60 video samples. But the problem is that the total number of frames in each video are different. In that case, the input to the LSTM will be uneven. Is there any way to solve this? or we need the videos with same number of frames?

Detail: The inputs are the keypoints which are extracted for each frame. Suppose for each frame the keypoints are 100, then for video with 60 frames, the total keypoints will be 6000. On the other hand for video with 75 frames, the keypoints will be 7500.

In first case, (x, y, 6000), in second case (x, y, 7500). BUT the input_shape to the LSTM (or any other NN) should be set constant (say (x, y, 6000)).

This is just for two case. I have over 50 videos. How can I solve this problem?


Solution

    1. Zero padding to complement the missing frames can be one of the solution. This will allow the total number of frames per video to be uniform for all set of videos.

    2. Another approach to make the uniform number of frames could be by making multiple copies of the first and last frames and appending them.