multithreading encoding streaming video-capture video-processing

Video encoding pipeline - threads design

I work on a system which does video capture & encoding of multiple channels. each stage takes time. The capture/encoding is done in HW, but still can take its time to finish.

capture frames->encode->file-save (or stream to network)

I have a dillema what would be a better approach/design:

one thread per channel, which call the pipeline blocking APIs one after the other such as:

while(1)
{
 frame = get_next_capture_frame(); //no blocking api - every 1/60 sec
 prev_bitstream =  send_to_encode_and_get_any_already_encoded_frame(frame); //no blocking api
 send_to_save_bitstream(prev_bitstream); //no blocking api
 delay(1/60); //wait 1/60
}

Or is it better to use several thread each doing its job: one thread for capture, another for encoding, and another for file-management. This problem gets more complex as more than one channel is involved (about 6 channels - which might result in 6 threads in the first approach and 18 threads in the second approach)

Another dillema on this problem domain: should thread wakeup periodically and do the job waiting in queue (say wakeup every 60fps), or should thread wake-up according to new event (new buffer for capture, new buffer for encoder, etc.)

Solution

It kind of depends on the requirements. If you know that you'll always have 6 channels at 60 FPS, and that the Capture/Encode/Save process will take less than 1/60 second, then one thread per channel is the easiest to code. But be aware that if encoding or saving sometimes takes too long, you won't get the next frame on schedule.

You could use a pipelined approach (similar to your second option), but not on a per-thread basis. That is, if you could have a single thread that does nothing but read and store a frame from each channel 60 times per second. Those frames go into the Captured queue. You have a separate thread (or perhaps multiple threads) reading from the Captured queue, encoding the data, and saving the results to the Output queue. Finally, one or more output threads read the Output queue and save the encoded data.

The queues are shared so that all of the Encoding threads, for example, read from the same Captured queue and write to the same Output queue. Most programming languages these days have efficient thread-safe queues. Many have such structures that don't require busy waiting. That is, the Encoding threads can do a non-busy wait on the Captured queue if it's empty. The threads will be notified when something is placed on the queue. See .NET's BlockingCollection or Java's ConcurrentLinkedQueue, for example.

That model scales well. If, for example, you need more encoding threads to keep up with the throughput, you can just more of those. You might end up with, for example, two capture threads, 8 encoders, and a single output thread. You can balance it based on your workload.

As for the scheduling, I suspect you'd want your capture thread(s) to operate on a periodic basis (i.e. once every 1/60 second, or whatever your frame rate is). The encoding and output threads should be configured to wait on their respective queues. There's no reason for the output thread, for example, to continually poll the Output queue for data. Instead, it can be idle (waiting) and get notified when a packet is placed in the Output queue.

The details of how to do this for a video encoder might make the approach unnecessarily messy. I really don't know. If the encoder requires channel-specific state information, it becomes more difficult. It's especially difficult when you consider that the model would allow two encoders to be working on frames from the same channel. If that happens then you need some way to sequence the output.

Your first approach is the simplest, and it's what I'd do for the first cut of my program. If that can't maintain the throughput you need, then I'd consider more complex approaches.