Search code examples
iosdecodeh.264rtsplive555

How to decode a live555 rtsp stream (h.264) MediaSink data using iOS8's VideoToolbox?


Ok, I know that this question is almost the same as get-rtsp-stream-from-live555-and-decode-with-avfoundation, but now VideoToolbox for iOS8 became public for use and although I know that it can be done using this framework, I have no idea of how to do this.

My goals are:

  • Connect with a WiFiCamera using rtsp protocol and receive stream data (Done with live555)
  • Decode the data and convert to UIImages to display on the screen (motionJPEG like)
  • And save the streamed data on a .mov file

I reached all this goals using ffmpeg, but unfortunately I can't use it due to my company's policy.

I know that I can display on the screen using openGL too, but this time I have to convert to UIImages. I also tried to use the libraries below:

  • ffmpeg: can't use this time due to company's policy. (don't ask me why)

  • libVLC: display lags about 2secs and I don't have access to stream data to save into a .mov file...

  • gstreamer: same as above

I believe that live555 + VideoToolbox will do the job, just can't figure out how to do this happen ...


Solution

  • I did it. VideoToolbox is still poor documented and we have no much information about video programming (without using ffmpeg) so it cost me more time than I really expected.

    For stream using live555, I got the SPS and PPS info to create the CMVideoFormatDescription like this:

    const uint8_t *props[] = {[spsData bytes], [ppsData bytes]};
    size_t sizes[] = {[spsData length], [ppsData length]};
    
    OSStatus result = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL, 2, props, sizes, 4, &videoFormat);
    

    Now, the difficult part (because I'm noob on video programming): Replace the NALunit header with a 4 byte length code as described here

    int headerEnd = 23; //where the real data starts
    uint32_t hSize = (uint32_t)([rawData length] - headerEnd - 4);
    uint32_t bigEndianSize = CFSwapInt32HostToBig(hSize);
    NSMutableData *videoData = [NSMutableData dataWithBytes:&bigEndianSize length:sizeof(bigEndianSize)];
    
    [videoData appendData:[rawData subdataWithRange:NSMakeRange(headerEnd + 4, [rawData length] - headerEnd - 4)]];
    

    Now I was able to create a CMBlockBuffer successfully using this raw data and pass the buffer to VTDecompressionSessionDecodeFrame. From here is easy to convert the response CVImageBufferRef to UIImage... I used this stack overflow thread as reference.

    And finally, save the stream data converted on UIImage following the explanation described on How do I export UIImage array as a movie?

    I just posted a little bit of my code because I believe this is the important part, or in other words, it is where I was having problems.