Search code examples
websocketwebrtcvideo-streaminglive-streamingmediastream

WebRTC vs WebSockets server to client/s (one to many) live video streaming from IP camera


I couldn't find a definitive answer for this, Let's say I have a server that receives an RTSP feed from an IP camera, this stream will be broadcasted to multiple clients, the connection is always initialized by the clients.

I am wondering whether it would be better in this case to use WebSockets instead of WebRTC to broadcast the media stream, because from what I've seen webRTC server implementations don't support media channels anyways, so I will need to use Data Channels, and go through packaging the stream to be MediaSource compatible,as will as configure Signaling, TURN, and STUN servers, when I could just do the same using WebSockets, am I missing something, or would WebSockets really be better in this case, like does WebRTC have any features that would make the overhead of implementing it over WebSockets worthwhile?

Edit: Forgot to mention that the Clients are Web Browsers.


Solution

  • Some notes and other things to consider:

    …from what I've seen webRTC server implementations don't support media channels anyways, so I will need to use Data Channels…

    You can run WebRTC media channels server-side, but you're right in that there is limited software available for doing so. I actually often end up using headless Chromium on the server because its easy, but this doesn't work for your use case since your streams are coming in via RTSP.

    If you go the WebRTC route, I'd recommend using GStreamer on the server side. It has its own implementation of everything needed for WebRTC. You can use it to take your existing stream and mux and stream and transcode as necessary for WebRTC.

    I could just do the same using WebSockets

    You could, but I would recommend using just regular HTTP at that point. Your stream is just unidirectional, from the server to the client. There's no need for the overhead and hassle of Web Sockets. In fact, if you do this right, you don't even need anything special on the client side. Just a video element:

    <video src="https://streams.example.com/your-stream-id" preload="none" controls></video>
    

    The server would need to set up all the video initialization data and then drop into the live stream. The client will just play back the stream no problem.

    I've gone this route using a lightweight Node.js server, wrapping FFmpeg. This way it's trivial to get the video from the source. When I did this, I actually used WebM. All data before the first Cluster element can be treated as initialization data. And then, assuming each Cluster starts with a keyframe (which is usually the case), you can drop into any part of the stream later. (See also: https://stackoverflow.com/a/45172617/362536)

    In other words, take the WebM/Matroska output from FFmpeg and buffer it until you see 0x1F43B675. Everything before that, hang on to it as initialization data. When a client connects, send that initialization data, and then start the "live" stream as soon as you see the next 0x1F43B675. (This is a quick summary to get you started, but if you get stuck implementing, please post a new question.)

    Now, what should you do?

    This comes down to some tradeoffs.

    • If you need low latency end-to-end (<2 seconds), you must use WebRTC.
      The whole stack, while complicated, is built around the lowest possible latency. Tradeoffs are made in the encoding, decoding, network, everywhere. This means lower media quality. It means that when packets are lost, everything is done to skip the client forward rather than buffering to try to get lost data. But, all this needs to be done if you require low latency.

    • If you want the simplest implementation, have a high number of clients per-source, or want to use existing CDNs, and you don't mind higher latency, consider HLS.
      With a simple FFmpeg command per-source, you can have live streams of all your inputs running all the time, and when clients connect they just receive the playlists and media segments. It's a great way to isolate the source end from the serving and the clients, and allows you to reuse a lot of existing infrastructure. The downsides of course are the added latency, and that you really should have the source streams running all the time. Otherwise, there will be a relatively long delay when starting the streams initially. Also, HLS gets you adaptive bitrate very easily, costing you only some more CPU for transcoding.

    • If you have few clients per-source and don't require ABR, consider a HTTP progressive streaming proxy.
      This can be basically a ~10 line Node.js server that receives a request for a stream from clients. When a request comes in, it immediately executes FFmpeg to connect to the source, and FFmpeg outputs the WebM stream. This is similar to what I was talking about above, but since there is a separate FFmpeg process per-client, you don't need to buffer until Cluster elements or anything. Simply pipe the FFmpeg output directly to the client. This actually gets you pretty low latency. I've gotten it as low as ~300ms glass-to-glass latency. The downside is that the client will definitely try to buffer if packets are lost, and then will be behind live. You can always skip the player ahead client-side by looking at the buffered time ranges and deciding whether to seek or increase playback speed. (This is exactly what HLS players do when they get too far behind live.) The client-side in this is otherwise just a video element.

    This is a pretty broad topic, so hopefully this answer gives you some more options to consider so that you can decide what's most appropriate for your specific use case. There is no one right answer, but there are definitely tradeoffs that are both technical, and in ease-of-development.