Search code examples
objective-cmacosh.264avcapturesession

AVSampleBufferDisplayLayer renders half of each frame when using high preset


I am using AVSampleBufferDisplayLayer to display video that is being streamed over the network. On the sending side an AVCaptureSession is used to capture CMSampleBuffers which are serialized into NAL units and streamed to the receiver, which then turns them back into CMSampelBuffers and feeds them to AVSampleBufferDisplayLayer (as is described for instance here). It works quite well - I can see the video and it streams more or less smoothly.

If I set the capture session's sessionPreset to AVCaptureSessionPresetHigh the video shown on the receiving side is cut in half - the top half displays the video from the sender while the bottom half is a solid dark green. If I use any other preset (e.g. AVCaptureSessionPresetMedium or AVCaptureSessionPreset1280x720) the video displays in its entirety.

Has anyone encountered such an issue, or has any idea what might cause it?

I tried examining the data at the source as well as the data at the destination, to see if I can determine where the image is being chopped off, but I have not been successful. It occurred to me that perhaps the high quality frame is being split into more than one NALUs and I am not putting it together correctly - is that possible? How does such splitting look like on the elementary-stream level (if possible at all)?

Thank you

Amos


Solution

  • The problem turned out to be that at the preset AVCaptureSessionPresetHigh one frame would get split into more than one type 5 (or type 1) NALU. On the receiving side I was combining the SPS, PPS and type 1 (or type 5) NAL units into a CMSampleBuffer, but ignoring the second part of the frame if they were split, which caused the problem.

    In order to recognize if two successive NAL units belong to the same frame it is necessary to parse the slice header of the picture NAL units. This requires delving into the specification, but goes more or less like this: the first field of the slice header is first_mb_in_slice which is encoded in Golomb encoding. Next come slice_type and the pic_aprameter_set_id, also in Golomb encoding, and finally the frame_number, as an unsigned integer of length (log2_max_frame_num_minus_4 + 4) bits (to get the value of log2_max_frame_num_minus_4 it is necessary to parse the PPS corresponding to this frame). If two consecutive NAL units have the same frame_num they are part of the same frame and should be put into the same CMSampleBuffer.