Search code examples
pythonazurespeech-recognitionspeech-to-text

Continuous speech recognition from microphone on MS Azure


I want to use the Azure Speech service for speech recognition from the microphone. I have a program running smoothly in Python with recognize_once_async(), this recognizes only the first utterance with a 15-second audio limit though. I did some research on this topic and went over sample code from MS (https://github.com/Azure-Samples/cognitive-services-speech-sdk/blob/master/samples/python/console/speech_sample.py) and couldn't find anything that enables continuous speech recognition from microphone... Any tips?


Solution

  • You could try the below code :

    import azure.cognitiveservices.speech as speechsdk
    import os
    import time
    
     
    path = os.getcwd()
    # Creates an instance of a speech config with specified subscription key and service region.
    # Replace with your own subscription key and region identifier from here: https://aka.ms/speech/sdkregion
    speech_key, service_region = "6.....9", "eastus"
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    
    # Creates a recognizer with the given settings
    speech_config.speech_recognition_language="en-US"
    #source_language_config = speechsdk.languageconfig.SourceLanguageConfig("en-US", "The Endpoint ID for your custom model.")
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)
    
    done = False 
    def stop_cb(evt):
        print('CLOSING on {}'.format(evt))
        speech_recognizer.stop_continuous_recognition()
        global done
        done= True
        
    
    #Connect callbacks to the events fired by the speech recognizer    
    speech_recognizer.recognizing.connect(lambda evt: print('RECOGNIZING: {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('RECOGNIZED: {}'.format(evt)))
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('SESSION STOPPED {}'.format(evt)))
    speech_recognizer.canceled.connect(lambda evt: print('CANCELED {}'.format(evt)))
    # stop continuous recognition on either session stopped or canceled events
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)
    
    speech_recognizer.start_continuous_recognition()
    
    while not done:
        time.sleep(.5)
    

    Explanation : By default, when you don't provide the audioconfig - the default input source is microphone.

    If you would like configure/customize - you could use audioconfig class

    In continous recognition there are various callback for events like - Recognizing,Recognized, cancelled.

    Output : enter image description here