I want to make a web-app which does video calls with live transcription -- using some 3rd party speech-to-text service (e.g. Google or Amazon). So the peer-to-peer MediaStream would be played to the users, and also sent to the API for transcription.
I am currently using https://peerjs.com/ to create the peer-to-peer call.
Is this feasible? Are there any code examples or libraries I could use?
Thank you, Daniel
I think it can easily be done with Azure speach to text service, Perhaps other solutions may be used, but for Azure I was able to quickly find all pieces.
There is browser use case example Here
This line is responsible for getting audio source from microphone, but in you case more interesting would be to use: fromStreamInput function, which accepts MediaStream.
I’m also wondering what would be better place to execute this process:
However this can be easily be tested.