I'm in the process of creating a very basic video player with the ffmpeg
libraries and I have all the decoding and re-encoding in place, but I'm stuck on audio video synchronization.
My problem is, movies have audio and video streams muxed (intertwined) in a way that audio and video comes in "bursts" (a number of audio packets, followed by juxtaposed video frames), like this, where each packet has its own timestamp:
A A A A A A A A V V V V A A A A A A A V V V V ...
A: decoded and re-encoded audio data chunk
V: decoded and re-encoded video frame
supposedly in a way to prevent too much audio to be processed without video, and the other way around.
Now I have to decode the "bursts" and send them to the audio/video playing components in a timely fashion, and I am a bit lost in the details.
Because I don't expect anything like this:
AAAAAAAAAAA .... AAAAAAAAAAAAA x10000 VVVVVVVVVVVVVV x1000
audio for the whole clip followed by video
or this:
VVVVVVVVVVVV x1000 AAAAAAAAAAA...AAAAAAAAA x1000
all video frames followed by the audio
to happen in a well encoded video (after all, preventing such extremes is what muxing is all about...)
Thanks!
UPDATE: since my description might have been unclear, the issue is not with how the streams are, or about how to decode them: the whole audio/video demuxing, decoding, rescaling and re-encoding is set and sound, and each chunk of data has its own timestamp.
My problem is what to do with the decoded data without incurring in buffer overrun and underrun and, generally, clogging my pipeline, so I guess it might be considered a "scheduling" problem.
Sync is the job of the container. Every frame will be time stamped with a PTS/DTS or duration/CTS