Search code examples
pythonazurespeech-to-text

Edit Azure Python code to clean up Speech-to-Text output


I'm using Microsoft Azure's speech-to-text API and it's working well but the output is cumbersome and I'd like to clean it up so that only the recognized speech is displayed.

this is what the output looks like

The python snippet that azure provides is:

try:
    import azure.cognitiveservices.speech as speechsdk
import sys
sys.exit(1)

speech_key, service_region = "***", "***"

weatherfilename = os.path.join(
    os.path.dirname(__file__),
    'orf_audio_2',
    '716_anton.wav')
# def speech_recognize_once_from_file():
    """performs one-shot speech recognition with input from an audio     file"""
# <SpeechRecognitionWithFile>
speech_config = speechsdk.SpeechConfig(subscription=speech_key,     region=service_region)
audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
# Creates a speech recognizer using a file as audio input.
# The default language is "en-us".
speech_recognizer =     speechsdk.SpeechRecognizer(speech_config=speech_config,     audio_config=audio_config)

start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

# Check the result
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))
# </SpeechRecognitionWithFile>

Solution

  • result.text in the sample code is the simplest output of recognized speech.

    My test with default microphone:

    enter image description here


    Please refer to below fragment of code which works for me.

    import azure.cognitiveservices.speech as speechsdk
    import time
    # Creates an instance of a speech config with specified subscription key and service region.
    # Replace with your own subscription key and service region (e.g., "westus").
    speech_key, service_region = "***", "***"
    
    weatherfilename = "D:\\whatstheweatherlike.wav"
    
    speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)
    audio_config = speechsdk.audio.AudioConfig(filename=weatherfilename)
    
    # Creates a recognizer with the given settings
    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)
    
    speech_recognizer.session_started.connect(lambda evt: print('SESSION STARTED: {}'.format(evt)))
    speech_recognizer.session_stopped.connect(lambda evt: print('\nSESSION STOPPED {}'.format(evt)))
    speech_recognizer.recognized.connect(lambda evt: print('\n{}'.format(evt.result.text)))
    
    # print('Say a few words\n\n')
    speech_recognizer.start_continuous_recognition()
    time.sleep(10)
    speech_recognizer.stop_continuous_recognition()
    
    speech_recognizer.session_started.disconnect_all()
    speech_recognizer.recognized.disconnect_all()
    speech_recognizer.session_stopped.disconnect_all()
    

    And the output looks like:

    enter image description here