Synthesize SPS and PPS for incomplete H264 stream for iOS VideoToolbox

I have an AXIS IP camera (M1054) which sends an H264/RTP stream via RTSP.

Unfortunately, they do not send SPS and PPS NALUs at all, they only transfer (fragmented) Codec slices.

I'm trying to decode that stream with the iOS VideoToolbox framework which needs the H264 SPS and PPS tuple to correctly setup the CMFormatDescription.

I wonder how I can synthesize the necessary parameter sets from looking at the actual H264 slices?

Update: I have captured an example session where mplayer manages to display the stream via Wireshark. The capture file is here and you can see the whole RTSP setup as well as a couple of seconds RTP.

Solution

RTP consists of 3 sets of flows.

RTP for the media
RTSP for controlling the connection
RTCP for the sender confirmation and timestamps.

Although the SPS/PPS is often in band inside the stream and is transported via RTP - it doesn't need to be there (and may be shouldn't be there). The SPS/PPS is transmitted as part of the setup process (RTSP). I usually recommend running http://www.live555.com/ in the debugger to learn about the details of the process - but http://www.live555.com/ is currently down.

In very rare circumstances you could recreate the SPS/PPS from a well known constrained H.264 stream. But in general you can't. So the SPS/PPS are metadata of the H.264 stream that is not redundantly stored anywhere else.

So if your familiarize yourself with the setup process - RTSP - it will be pretty obvious.