How to format features for SVM for human recognition?

I am using the Eigenjoints of skeleton features to perform human action recognition by Matlab.

I have 320 videos, so the training data is 320x1 cell array, each one cell contains Nx2970 double array, where N is number of frames (it is variable because each video contains different number of frames), 2970 is number of features extracted from each video (it is constant because I am using same extraction method for all videos).

How can I format the training data into a 2d double matrix to use as input for an SVM? I don't know how to do it because SVM requires double matrix, and the information I have is one matrix for each video of different sizes.

Solution

Your question is a bit unclear about how you want to go about classifying human motion from your video. You have two options,

Look at each frame in isolation. This would classify each frame separately. Basically, it would be a pose classifier
Build a new feature that treats the data as a time series. This would classify each video clip.

Single Frame Classification

For the first option, the solution to your problem is simple. You simply concatenate all the frames into one big matrix.

Let me give a toy example. I've made X_cell, a cell array with a video with 2 frames and a video with 3 frames. In your question, you don't specify where you get your ground truth labels from. I'm going to assume that you have per video labels stored in a vector video_labels

 X_cell = {[1 1 1; 2 2 2], [3 3 3; 4 4 4; 5 5 5]}; 
 video_labels = [1, 0];

One simple way to concatenate these is to use a for loop,

X = [];
Y = [];
for ii = 1:length(X_cell)
     X = [X; X_cell{ii}];
     Y = [Y', repmat(video_labels(ii), size(X_cell{ii},1), 1)];
end

There is probably also a more efficient solution. You could think about vectorizing this code if you need to improve speed.

Whole Video Classification

Time series features are a course topic all in themselves. Here the simplest thing you could do is simply resize all the video clips to have the same length using imresize. Then vectorize the resulting matrix. This will create a very long, redundant feature.

num_frames = 10; %The desired video length
length_frame_feature = 2;
num_videos = length(X_cell);
X = zeros(num_videos, length_frame_feature*num_frames);
for ii=1:length(X_cell)
    video_feature = imresize(X_cell{ii}, [num_frames, length_frame_feature]); 
    X(ii, :) = video_feature(:);
end
Y = video_labels;

For more sophisticated techniques, take a look at spectrograms.