I needed to implement speech 'identification', ie. Guess if the person who is trying to login, is actually him/her, by matching his/her voice. To consider the case, where the app doesn't recognize the person, but the user is himself trying to login, then he may bypass it with a pin, setup during initial settings.
I am using Python and Flask, to build the webapp, and included javascript in the question, so as to know of possible ways in it too. Till now, i read about it from some sources, but i couldn't arrive at a possible solution, on stack overflow, as well as 'few' blog posts.
The best 'possible' solution i could arrive at was Cognitive Speech Services by Microsoft - https://azure.microsoft.com/en-us/services/cognitive-services/speaker-recognition/
I also thought of recording the voice using the Recorder.js, and analyzing at the server end, but couldn't implement it.
So, i wanted a way to implement it on the web app, even a simple gist with a bit of code on using ms cognitive services (i did read pages of the documentation, but it didnt help much), or doing it by python will be helpful.
The documentation at https://learn.microsoft.com/en-us/azure/cognitive-services/speaker-recognition/home will be helpful. Note that there are clickthrough links to the API reference. It explains the high-level process (use “enrollment” to train).
We have speaker identification, which is distinguishing who is speaking from a group of known voices that you train with. You need to provide labelled data (meaning examples of a known speaker talking): see https://westus.dev.cognitive.microsoft.com/docs/services/563309b6778daf02acc0a508/operations/5645c3271984551c84ec6797.
or Please follow the below link speech SDK samples. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/
You can use batch transcription api and enable diarization. https://learn.microsoft.com/en-us/azure/cognitive-services/speech-service/batch-transcription