I need to extract a single "keyframe" from a video of a particular human action(the actions could be generic) such that it is discriminative as opposed to descriptive (Finding an interesting frame in a video).
In short, I need to find that one frame in a basketball video that discriminates it from say, a coffee-drinking video.
Most of the papers I've seen have been some kind of video summarization technique, but the frames thus extracted need not be the best to separate action categories. This is my stumbling block - during test time, I only have the test video to extract a keyframe, yet I need some model which will allow me to extract the frame most different from other action category videos.
Although this is an interesting problem, it sounds ill-defined to me. You want a frame (there's a good chance there'll be more than one, so it's probably incorrect to talk about "the one frame") that distinguishes your test video from other videos, but you don't know what the other videos are. For example, what if your whole set consists of basketball videos? Without knowing (or at least having some reasonable expectation of) what the other videos are, this task is impossible even for a human.
One way I could think of involves a probabilistic model that helps you determine how likely a frame is to be unique or not. You could train this model using some existing video test set: compare all the frames to each other using some similarity measure, and focus on the ones that occur the least frequently. Then apply the model to a different (but similar) test set. YMMV.
Lastly, you mentioned that your interested in action categories, but you're focusing on frames, i.e. still images only. It may be useful to first segment the videos into shots (have a look at the link you posted) and then look for the unique shots. You could then pick your unique frame candidate from the unique shots.
Good luck!