Search code examples
discordelixirwebrtchttp-live-streaminglive-streaming

How to build webRTC m:m audio/video live-streams/calls like discord does? client to client via gateway for IP protection


mux.com (and also agora.io and so on) is a great service, but very expensive since it's a server solution. I can't use that.

Discord is a great client solution, that just uses gateways as a pass-through to hide IP addresses and so on. They described their entire architecture here: https://discord.com/blog/how-discord-handles-two-and-half-million-concurrent-voice-users-using-webrtc Discord ain't the only one with this approach, Instagram has AFAIK the same approach too, since it's cheap and does what it does

I want to use for my social media app (like instagram) this solution too, but without these many custom built things to increase performance. I am a one-man team and I can't handle that complexity; still i don't want to use mux because it's way too expensive for me

I am okay with the stock/standard performance. Does anyone know or can point me to a tutorial, where to start building such webRTC elixier gateway solution for m:m audio/video live streams calls?

maybe there already is code published that I can just copy paste

thanks a lot!!

edit ive got an answer on their official forum https://elixirforum.com/t/how-to-build-webrtc-m-m-audio-video-live-streams-calls-like-discord-does-client-to-client-via-gateway-for-ip-protection/44956


Solution

  • Discord backend use SFU to forward streams for peers in a videoroom, the description from the discord post:

    Discord Voice server contains two components: a signaling component and a media relay component called the selective forwarding unit or SFU. The signaling component fully controls the SFU and is responsible for generating stream identifiers and encryption keys, forwarding speaking indication, etc.

    Note that the projects in answer are written in Elixir(based on Erlang) programming language, which is not very common used in neither live streaming nor WebRTC. For example, FFmpeg, x264, libopus, WebRTC, SRS, all these audio/video components are written in C++, you'd better think about it.

    For a video chat product like discord:

    • The client app, no doubt, could be built on WebRTC, both H5 and mobile.
    • For SFU server, recommend C++ server, for example, SRS or mediasoup. Because the whole audio/video economy is C++ based, there're lots of stuff to handle for SFU.
    • About the signaling server, also called videoroom, could be written by nodejs or Go, because it depends on your business, so highly recommend your best skilled language, there're lots of work to do in this server.

    And not all peers in a video room need to publish video stream, instead they only play or consume streams, so it's actually low latency live streaming. For more information about live streaming and video chat, please read this post.