I need to know if there's a way to set up a conference call in which the audio of multiple callers can be independently streamed to a server for independent processing.
I'm looking into a way to provide our own real time transcripts of conference calls without having to Diarize speaker audio.
I implemented something similar (albeit with only two callers) via Twilio Stream Resources. Using these you create individual streams of calls distinguished via call sids. You can then feed these into a web socket server to tie them together and process them in any way you want.
You can find the docs here: https://www.twilio.com/docs/voice/api/stream-resource