Search code examples
pythonazurespeech-to-text

Mp3 to Wav. convert in Python


I'm currently working on a project where I request a phone call (Mp3) and have to make an automatic transcript through a python script. I'm using the Azure Speech to text services and got that all working, but that service only supports a Wav. file and I am still stuck at that part.

import azure.cognitiveservices.speech as speechsdk
import time
from os import path
from pydub import AudioSegment
import requests
import hashlib


OID = ***

string = f"***"
encoded = string.encode()
result = hashlib.sha256(encoded)
resultHash = (result.hexdigest())

r = requests.get(f"***", headers={f"***":f"{***}"})
Telefoongesprek = r

# converts audio file (mp3 to Wav.)

#src = Telefoongesprek
#dst = "Telefoongesprek #****.wav"

#sound = AudioSegment.from_mp3(src)
#sound.export(dst, format="wav")

def speech_recognize_continuous_from_file():
    speech_config = speechsdk.SpeechConfig(subscription="***", region="***")
    speech_config.speech_recognition_language = "nl-NL"
    audio_config = speechsdk.audio.AudioConfig(filename="Telefoongesprek #****.wav")

    speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config, audio_config=audio_config)

    done = False

    def stop_cb(evt):
        print('CLOSING on {}'.format(evt))
        nonlocal done
        done = True

    all_results = []
    def handle_final_result(evt):
        all_results.append(evt.result.text)
    #speech_recognizer.recognizing.connect(handle_final_result)
    speech_recognizer.recognized.connect(handle_final_result)
    speech_recognizer.session_started.connect(handle_final_result)
    speech_recognizer.session_stopped.connect(handle_final_result)
    speech_recognizer.canceled.connect(handle_final_result)
    speech_recognizer.session_stopped.connect(stop_cb)
    speech_recognizer.canceled.connect(stop_cb)

    speech_recognizer.start_continuous_recognition()
    while not done:
        time.sleep(.5)

    speech_recognizer.stop_continuous_recognition()

    print(all_results)
speech_recognize_continuous_from_file()

Thats the code im using without all the keys and encryption, and everthing works apart from the convert from MP3 to Wav. is there any way I can save the requested file locally in this script and pass it through in: audio_config = speechsdk.audio.AudioConfig(filename="Telefoongesprek #****.wav"). or do I have to save it to the pc and do it another way. I have been stuck on this problem for over a week and have tried many different ways. Thanks in advance!

Beau van der Meer


Solution

  • You should be able to save the response data ( you can access the raw bytes with r.content) to a .mp3 file locally and then pass that file path to pydub.

    with open('path/to/local/file.mp3', 'wb') as f:
        f.write(r.content)
    

    Another option is to use the module io.BytesIO from the standard library. If you pass it raw bytes, e g import io; f = io.BytesIO(r.content), it will give you a object that behaves like an open filehandle back, which you can pass to functions accepting files. I didn't check that pydub method you are trying to use accepts filehandles or only paths, so you have to check that first.