I am writing a C++ computer vision application with required behavior of sampling an IP camera's video, not playing their stream. To clarify, streaming from an IP camera delivers a temporally compressed video stream defined as a start and end point in time, often expressed as h264. In contrast, sampling a video camera is requesting a single image "now". If image requests occur fast enough that h264 or similar is more efficient, then use that compression, but never deliver an "old image" to the library client from before the current image request in time.
Basically, the video library needs to provide a video sampling interface, not a video streaming interface. If the time between two requests for a video samples is 5 minutes, the video image returned is the most recently produced image.
From my understanding of h264, IP video streams, and writing applications using libavcodec for a few years, the most efficient method of satisfying these requirements is a two thread architecture. One thread's job is to continually consume frames from the IP camera, while the second thread's job is to accept frames from the first thread and only give the latest video frame to the library client when they request an image from the camera. Key to satisfying the library's requirements is the video consuming thread running separate from the library client's application. The first thread needs to spin consuming frames, both to maintain the camera communication health and to maintain the latest frame for the library client.
If these requirements were attempted with one thread, and the time between video samples were 5 minutes (or even 5 seconds), the video stream might have died from the IP camera because the stream was not being consumed, but if the stream were still alive the receiving software would have to "stream past and throw away" any frames the camera may have backlogged.
Basically, this "sampling" behavior is not a normally expected behavior of IP cameras nor of video streaming in general. Short of using a picture capture interface, in order to support this behavior software needs a "spin thread" consuming frames so the most recently produced frame is available when the library client requests it. There is no "mode" or "live profile" for video streaming supporting a video sampling interface. One needs to create this in software, with a "video frame consuming thread" which runs separate from the main application. Is this correct thinking, or am I wrong somewhere?
Assuming most IP cameras support RTP. I'd use a library like Live555 to receive your camera's H.264 stream. Then I would lightly parse the H.264 stream to identify the frame types and frame boundaries within the stream. I would buffer one Group of Frames (GOP) - starting at an I-frame. Once you get the next I frame - clear your buffer first. If you get a sample request - send your H.264 buffer to a H.264 decoder and then send off the last frame coming out of the decoder as sample request to the video library. I'd probably run the RTP receiver and buffer producer on one thread and the library request receiver on another thread. You have to do some sort of locking on the buffer. You can't clear the buffer while the decode is in progress.