Saving video from an RTP stream to a file

I'm trying to get and store the data from an IP camera and I'd appreciate some high level advice as to the best way to do this.

So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.

I'm very happy to do the work, I'd just appreciate some pointers / a high level overview of the steps so I can deconstruct the project!

Solution

There is no direct answer to the OPs question here for his question is a bit broad, and without further information that pertains to what the OP intends to do with that information it is difficult to give a precise answer. What I can do here is to suggest to the OP steps that maybe taken and what problems to consider.

OP had stated:

So far I've successfully initiated an RTSP conversation with the camera and have it sending me UDP packets with RTP payloads. But I'm unsure where to go from here.

Now that you have an established communication with the camera and are able to receive data packets via video stream it is now a matter of being able to understand what the RTP payloads are, or how to interpret that data. So at this point you will have to do your research on the RTP protocol which appears to me to be a type of a Network Protocol. Once you have written your structure and functions to work successfully with this protocol it is a matter of breaking down the UPD packets into useful bytes of information. Normally in many cases when it comes to processing either graphic, video or audio data either from a file directly or a stream object, they are usually accompanied with some type of header information. Next, it is a matter of understanding this Header information which is normal in a form of a structure that gives information about the type of content this file or stream holds, so that you know how many bytes of information to extract from it.

I know it's not going to just be a case of saving the RTP payload directly to a file, but what other steps are involved?

The steps involved may vary depending on your needs and what you intend to do with the information: Are you trying to write the properties or the general information about the video content to a file such as: its compression type, its audio - video codec type, its resolution and frame rate information, its byte rate etc.? Or are you trying to write the actual video content itself to a file that your application will use either for play back or for editing purposes? This all depends on your intentions.

Is the data compressed, in which case I have to decompress it?

At this point once you have successfully been able to interpret the RTP Protocol and parsed the data packets by understanding their header information and saving it to a proper structure, it is then a matter of using that header information to determine what is actually within that stream object. For example and according to the PDF about the properties of the Video Camera that you have supplied the Video Compression can be saved in 2 types, H.264 or MJPEG, this you will have to determine by the information that was provided in the header, from here you would have to branch your code and be able to read and parse each type of compression or, accept the one that you are willing to work with and disregard the other. Next is the Audio Compression if you are concerned about the audio and the types available are AAC(Encoding only), G.711 A-Law, & G.711 U-Law and the same mechanisms apply here. The once are you able to get past the audio and video compression you will then need vital information about the video information itself such as what Resolution and Frame Rate (buffer sizes) were stored from the header information so you know how many bytes to read from the stream and how far to move your pointer through the stream. If you notice the Resolution And Frame Rate there are different acceptable formats available from each type of Compression that is being used:

H.26
- 1920 x 180 (2.1MP) @ 30 fps (1080p)
- 1280 x 720 @ 60 fps (720p)*
- 720 x 480/576 @ 30/25 fps (D1)
- 704 x 480/576 @ 30/20 fps (4CIF)
- 352 x 240/288 @ 30/25 fps (CIF)
MJPEG
- 720 x 480/576 @ 30/25 fps (D1)
- 740 x 480/578 @ 30/25 fps (4CIF)
- 352 x 240/288 @ 30/25 fps (CIF)

Now this is for the resolution & frame rate but the next thing to consider is you are working with video stream so the above may not apply here in your case and according to the properties about Video-Stream capabilities from the Video Camera These are the types available that you will have to take into account for:

Single-stream H.264 up to 1080p (1920 x 1080) @ 30 fps
Dual-stream H.264 and MJPEG
- H.264: Primary stream programmable up to 1280 x 720 @ 25/20 fps
- MJPEG: Secondary stream programmable up to 720 x 576 @ 25/20 fps

With these different types available for your Video Camera to use you have to take all these into consideration. Now this also depends on your intentions of your application and what you intend to do with the information. You can write your program to accept all of these types or you can program it to accept only one type with a specific format of that type. This depends on you.

Do I have to do any other modifications?

I don't think you would have any modifications to do unless if your intentions within your actual application is to modify the video - audio information itself. If your intentions within your application are to just read the file for simple playback then the answer would be no as long as all the appropriate information was saved properly and your file parser for reading your custom file structure is able to read in your file's contents and is able to parse the data appropriately for general playback.

Where can I learn about what I'll need to do specific to this camera?

I don't think you need to much more information about the camera itself, for the PDF that you provided in the link within your question has already given you enough information to go on with. What you would need from here is information and documentation about the specific Protocols, Packet Types, Compression & Stream types which a general search of these should suffice.

UDP

Do a Google search for c++ programming UDP Sockets for either Linux or Winsock.

RTP

Do a Google search for c++ programming RTP Packets

Video Compression

Do a Goggle search for both H.26 & MJPEG compression and structure information on stream objects.

Audio Compression

Do a Google search for each of AAC(encoding only), G.711 A-Law, G.711 U-Law if you are interested in the audio as well.

From there once you have the valid specifications for these data structures as a stream object and have required the appropriate header information to determine which type and format this video content is saved as then you should be able to easily parse the Data Packets appropriately. Now as to how you save them or write them to a file all depends on your intentions.

I have provided this as a guideline to follow in order to help lead you in the right direction in a similar manner that a chemist, physicist, scientist, or engineer would approach any typical problem.

The general steps are by following a scientific approach about the current problem. These typically are:

Assessing the Situation
Create either a Hypothesis or a Thesis about the Situation.
Gather the Known Facts
Determine the Unknowns
Draft a Model that Shows a Relationship Between the Known and Unknowns.
Perform both Research and Experimentation
Record or Log Events and Data
Analyze the Data
Draw a Conclusion

Now in the case of writing software application the concept is similar but the approaches may be different or vary as not all of the steps above may be needed and or some additional steps may be required. One such step in the Application Development Cycle that is not found in the Scientific approach would be the process of Debugging an Application. But the general guideline still applies. If you can keep to this type of strategy I am sure that you will be able to have the confidence in gathering what you will need and how to use it from a step by step process to achieve your goals.