I am doing an experiment on generating thumbnails for web videos. I plan to extract I-frames from the binary stream by simulating the working principle of the decoder, and add the PPS and SPS information of the original video to form the H264 raw information, which is then handed over to ffmpeg to generate images. I have almost solved many problems, and even wrote a demo to implement my function, but I can't find any information about where there is an identifier when multiple NALUs form one frame (strictly speaking, there is a little, but it can't solve my problem, I will talk about it later).
You can use the following command to generate the type of video I mentioned:
ffmpeg -i input.mp4 -c:v libx264 -x264-params slices=8 output.mp4
This will generate a video with 8 slices per frame. Since I will use this file later, I will also generate the H264 raw information file with the following command:
ffmpeg -i output.mp4 -vcodec copy -an output.h264
When I put it into the analysis program, I can see multiple IDR NALUs connected together, where the first_mb_in_slice in the Slice Header of the non-first IDR NALU is not 0:
But when I go back to the mdat in MP4 and look at the NALU, all the first_mb_in_slice become 0:
0x9a= 1001 1010, according to the exponential Golomb coding, first_mb_in_slice == 0( ueg(1B) == 0 ), slice_type == P frame (ueg(00110B) == 5), but using the same algorithm in the H264 raw file, the result is the same as the program gives.
Is there any other place where there is an identifier for this information? Assuming I randomly get a NALU, can I know if this video is sliced or not, or is my operation wrong?
PS: Putting only one NALU into the decoder is feasible, but only 1/8 of the image can be parsed
If you need a reference, the address of the demo program I wrote is: https://github.com/gaowanliang/web-video-thumbnailer
" I plan to extract I-frames"
Make sure you go for IDR keyframes (not I-frame keyframes) since IDR bytes can decode into a complete image. Some I-frames can actually need other P/B frames to make a complete image.
"I can't find any information about where there is an identifier when multiple NALUs form one frame"
(1) Using SEI: (NALU type 6, usually as byte 0x06
)
solution: Find text slices=
in the SEI text.
The MP4 might contain SEI bytes, which will be in front of the bytes of the first video frame.
SEI is text data, and if using libx264 as encoder then, it includes a "slices="
entry.
An example of SEI text in a libx264 encode:
cabac=1 ref=3 deblock=1:0:0 analyse=0x3:0x113 me=hex ... (other texts)
sliced_threads=0 slices=8 nr=0 decimate=1 ... (other texts)
constrained_intra=0 bframes=3 b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 ... (other texts)
Usually IDR is one big slice (ie: Really libx264? We slice IDR frames now? For what benefit though?).
Using -x264-params slices=8
will overwrite the 1-slice default. As you can see there is now a "slices=8"
text entry which tells us to find: 8 NAL units per frame (including even if IDR).
(2) Using STSS and STSZ:
solution: A size (or bytes-length) listed in STSZ will include all NALU slices per frame.
73 74 73 73
== integer 0x73747373
.73 74 73 7A
== integer 0x7374737A
.The MP4's video track will have a Sample Table. It has an STSS section for listing all IDR keyframes. Then it has an STSZ section for listing the bytes-length of all frames.
Using these two sections of the MP4 header, you can find out which frame numbers that are representing IDR, then check the size by matching the STSZ entry's number to the related frame number.
Extract a keyframe by the shown size in STSZ to get a full frame (with all NALU slices).
(1) Using SEI: (NALU type 6, usually as byte 0x06
)
solution: Find text slices=
in the SEI text.
The same process as with MP4 (as explained above).
(2) Using first_mb_in_slice:
solution: The value of first_mb_in_slice
is 1 for first slice and 0 for all other slices of the frame.
You get first_mb_in_slice by checking the first bit of the next byte after NALU header 0x65.
In the case of 8 slices (per frame), when you find an IDR frame, the first_mb_in_slice will be a 1 for the first slice, then following is another 7 IDR units each with a first_mb_in_slice
of 0.
You will know that you have enough NALUs for one frame when:
first_mb_in_slice
of the next IDR becomes 1 again (it means this is now a different IDR).