I want to do some detection and classification work on video frames, however, there are too many frames in a video to be processed, so I want to find which frames contain objects and which frames are meaningless(not contain objects or faces) so that I can save some time by detecting on less frames.
I already test Gist and SVM, trying to separate images containing dogs(pascal voc) from forest scene images(15 scene dataset), but the accuracy on test data is very low(less than 50%).
Is there any other feature or algorithm suitable for this task? Also is there any data set suitable for this task?
You could look into visual saliency detection methods. If there are saliency clusters, these frames likely contain objects.