Search code examples
javascriptpythonspeech-recognitionazure-cognitive-services

How to implement Speech 'Identification' in Javascript (or a Flask WebApp)?


I needed to implement speech 'identification', ie. Guess if the person who is trying to login, is actually him/her, by matching his/her voice. To consider the case, where the app doesn't recognize the person, but the user is himself trying to login, then he may bypass it with a pin, setup during initial settings.

I am using Python and Flask, to build the webapp, and included javascript in the question, so as to know of possible ways in it too. Till now, i read about it from some sources, but i couldn't arrive at a possible solution, on stack overflow, as well as 'few' blog posts.

The best 'possible' solution i could arrive at was Cognitive Speech Services by Microsoft - https://azure.microsoft.com/en-us/services/cognitive-services/speaker-recognition/

I also thought of recording the voice using the Recorder.js, and analyzing at the server end, but couldn't implement it.

So, i wanted a way to implement it on the web app, even a simple gist with a bit of code on using ms cognitive services (i did read pages of the documentation, but it didnt help much), or doing it by python will be helpful.


Solution

  • The documentation at https://learn.microsoft.com/en-us/azure/cognitive-services/speaker-recognition/home will be helpful. Note that there are clickthrough links to the API reference. It explains the high-level process (use “enrollment” to train).

    We have speaker identification, which is distinguishing who is speaking from a group of known voices that you train with. You need to provide labelled data (meaning examples of a known speaker talking): see https://westus.dev.cognitive.microsoft.com/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797.

    or Please follow the below link speech SDK samples. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/

    You can use batch transcription api and enable diarization. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription