Search code examples
javascriptweb-applicationsspeech-to-textpeerjsmediastream

How to get a transcript of an audio or video call within a js web app? I.e. how to route a MediaStream to a speech-to-text API


I want to make a web-app which does video calls with live transcription -- using some 3rd party speech-to-text service (e.g. Google or Amazon). So the peer-to-peer MediaStream would be played to the users, and also sent to the API for transcription.

I am currently using https://peerjs.com/ to create the peer-to-peer call.

Is this feasible? Are there any code examples or libraries I could use?

Thank you, Daniel


Solution

  • I think it can easily be done with Azure speach to text service, Perhaps other solutions may be used, but for Azure I was able to quickly find all pieces.

    There is browser use case example Here

    This line is responsible for getting audio source from microphone, but in you case more interesting would be to use: fromStreamInput function, which accepts MediaStream.

    I’m also wondering what would be better place to execute this process:

    • on source machine, send audio to receiver and to speachSDK, and after getting transcription send it afterwards.
    • on receiver machine, after receiving use SDK to get transcription, in this case audio may have worse quality as it was compressed during transmission, therefore you may get worse transcription.

    However this can be easily be tested.