How to decode a live555 rtsp stream (h.264) MediaSink data using iOS8's VideoToolbox?

Ok, I know that this question is almost the same as get-rtsp-stream-from-live555-and-decode-with-avfoundation, but now VideoToolbox for iOS8 became public for use and although I know that it can be done using this framework, I have no idea of how to do this.

My goals are:

Connect with a WiFiCamera using rtsp protocol and receive stream data (Done with live555)
Decode the data and convert to UIImages to display on the screen (motionJPEG like)
And save the streamed data on a .mov file

I reached all this goals using ffmpeg, but unfortunately I can't use it due to my company's policy.

I know that I can display on the screen using openGL too, but this time I have to convert to UIImages. I also tried to use the libraries below:

ffmpeg: can't use this time due to company's policy. (don't ask me why)
libVLC: display lags about 2secs and I don't have access to stream data to save into a .mov file...
gstreamer: same as above

I believe that live555 + VideoToolbox will do the job, just can't figure out how to do this happen ...

Solution

I did it. VideoToolbox is still poor documented and we have no much information about video programming (without using ffmpeg) so it cost me more time than I really expected.

For stream using live555, I got the SPS and PPS info to create the CMVideoFormatDescription like this:

const uint8_t *props[] = {[spsData bytes], [ppsData bytes]};
size_t sizes[] = {[spsData length], [ppsData length]};

OSStatus result = CMVideoFormatDescriptionCreateFromH264ParameterSets(NULL, 2, props, sizes, 4, &videoFormat);

Now, the difficult part (because I'm noob on video programming): Replace the NALunit header with a 4 byte length code as described here

int headerEnd = 23; //where the real data starts
uint32_t hSize = (uint32_t)([rawData length] - headerEnd - 4);
uint32_t bigEndianSize = CFSwapInt32HostToBig(hSize);
NSMutableData *videoData = [NSMutableData dataWithBytes:&bigEndianSize length:sizeof(bigEndianSize)];

[videoData appendData:[rawData subdataWithRange:NSMakeRange(headerEnd + 4, [rawData length] - headerEnd - 4)]];

Now I was able to create a CMBlockBuffer successfully using this raw data and pass the buffer to VTDecompressionSessionDecodeFrame. From here is easy to convert the response CVImageBufferRef to UIImage... I used this stack overflow thread as reference.

And finally, save the stream data converted on UIImage following the explanation described on How do I export UIImage array as a movie?

I just posted a little bit of my code because I believe this is the important part, or in other words, it is where I was having problems.