I am developing a WinForms application in VB.NET where I need to parse an MP4 file (specifically, MP4 Version 1 based on ISO_IEC_14496-12; and the H.264/AVC codec), extract frames as images, and store them. I am not using any NuGet packages or external libraries for this project.
When working with test videos that do not contain audio, I've noticed that the number of NALUs obtained from the parsing function matches the number of frames, sometimes with an additional NALU when its type is 6.
However, I encounter a problem with videos that do contain audio. The NALUnitLength sometimes exceeds the file size, or (tempPos + NALUnitLength) > mdatEndPos
. When attempting to brute force the following NALU in such cases, I sometimes obtain nonsensical results:
The allowed NALU types are 0 to 13, 19, and 24 to 31.
Unfortunately, the documentation does not provide any information on this matter.
Can anyone provide assistance or insights on this issue?
Thank you!
Frames are mapped into NAL Units but there is additional metadata that may be essential to decode the video data. For example PPS, SPS or SEI are all NAL Units.
Sometimes a frame is broken down into multiple slices and then you can have multiple video NAL Units per frame.
Further - interlaced video can be encoded as separate fields resulting into at least two video NAL Units per frame.
In short - there is no 1:1 relationship between frame count and NAL Unit count.
Multiple NAL Units make one Access Unit and an Access Unit is usually a frame.
This specification goes into all details how to map H.264 into MP4:
https://www.iso.org/standard/83336.html
This specification shows all the details about H.264: