I'm using NVENC SDK to encode OpenGL frames and stream them over RTSP. NVENC gives me encoded data in the form of several NAL units. In order to stream them with Live555 I need to find the start code (0x00 0x00 0x01) and remove it. I want to avoid this operation.
NVENC has a sliceOffset attribute which I can consult, but it indicates slices, not NAL units. It only points the ending of the SPS and PPS headers, where the actual data starts. I understand that a slice is not equal to a NAL (correct me if I'm wrong). I'm already forcing single slices for encoded data.
Is any of the following possible?
There seems to be a point where every person trying to do H.264 over RTSP/RTP comes down to this question. Well here are my two cents:
1) There is a concept of an access unit. An access unit is a set of NAL units (may be as well be only one) that represent an encoded frame. That is the level of logic you should work at. If you are saying that you want the encoder to give you individual NAL unit's, then what behavior do you expect when the encoding procedure results in multiple NAL units from one raw frame (e.g. SPS + PPS + coded picture). That being said, there are ways to configure the encoder to reduce the number of NAL units in an access unit (like not including the AUD NAL, not repeating SPS/PPS, exclude SEI NAL's) - with that knowledge you can actually know what to expect and kind of force the encoder to give you single NAL per frame (of course this will not work for all frames, but with the knowledge you have about the decoder you can handle that). I'm not an expert on the NVENC API, I've also just started using it, but at least as for Intel Quick Sync, turning off AUD,SEI and disabling repetition of PPS/SPS gave me roughly around 1 NAL per frame for frames 2...N.
2) Won't be able to answer this since as I mentioned I'm not familiar with the API but I highly doubt this.
3) SPS and PPS should be in the first access unit (the first bit-stream you get from the encoder) and you could just find the right NAL's in the bit-stream and extract them, or there may be a special API call to obtain them from the encoder.
All that being said, I don't think it is that hard to actually run through the bit-stream, parse the start codes and extract the NAL unit's and feed them to Live555 one by one. Of course, if the encoder offers to output the bit-stream in the AVCC format (compared to the start codes or Annex B it uses interleaved length value between the NAL units so you can just jump to the next one without looking for the prefix) then you should use it. When it is just RTP it's easy enough to implement the transport yourself, since I've had bad luck with GStreamer that did not have proper support for FU-A packetization, in case of RTSP the overhead of the transport infrastructure is bigger and it is reasonable to use a 3rd party library like Live555.