python windows speech-recognition file-permissions openai-whisper

Permission denied while using SpeechRecognition's recognize_whisper() method

I'm trying out some of the transcription methods of the SpeechRecognition module. I was able to transcribe using Google API (recognize_google()) just fine, but when I try using OpenAPI's Whisper (recognize_whisper()), a temporary file "%LocalAppData%\Temp\tmps_pfkh0z.wav" (the actual filename changes slightly each time) is created and the script fails with a "permission denied" error:

Traceback (most recent call last):
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\whisper\audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\ffmpeg\_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "d:\Users\Renato\Documents\Code\projects\transcriber\main.py", line 15, in <module>
    print("Transcription: " + r.recognize_whisper(audio_data=audio_data, model="medium", language="uk"))
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\speech_recognition\__init__.py", line 1697, in recognize_whisper
    result = self.whisper_model[model].transcribe(
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\whisper\transcribe.py", line 85, in transcribe
    mel = log_mel_spectrogram(audio)
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\whisper\audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File "D:\Users\Renato\Documents\Code\projects\transcriber\.venv\lib\site-packages\whisper\audio.py", line 47, in load_audio
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100  libpostproc    56.  6.100 / 56.  6.100C:\Users\Renato\AppData\Local\Temp\tmps_pfkh0z.wav: Permission denied

The code itself is pretty straightfoward:

import speech_recognition as sr

r = sr.Recognizer()
with sr.AudioFile("audio.wav") as src:
    audio_data = r.record(src)
    print("Transcription: " + r.recognize_whisper(audio_data=audio_data, model="medium", language="en"))

I tried different installations of ffmpeg (gyan.dev and BtbN pre-built packages, and I also tried installing through chocolatey).

I also tried unchecking the "Read-only" option on the Temp folder properties, but the error still happens.

I'm running the script on a virtual environment created with venv, on a Windows machine.

Solution

So, I got it to work somehow. The "recognize_whisper" in the "Recognizer" class in the speech_recognition "__init__.py" file there has the line:

with tempfile.NamedTemporaryFile(suffix=".wav") as f:

I guess because I run Windows here (yes, I hate it too...), it somehow gets some permission issues. I replaced it with:

with open('test.wav', 'wb') as f:

Now, the .wav file gets generated and it runs without error. But also without showing the recognition result...

Addition: after playing around with the speech_recognition some more, I think the whisper integration is just not working? I tried giving both whisper and google the same audio file:

AUDIO_FILE = 'test.wav'

r = sr.Recognizer()
with sr.AudioFile(AUDIO_FILE) as source:
    audio = r.record(source)  # read the entire audio file
r.recognize_whisper(audio)
r.recognize_google(audio)

This gives results for the google recognition but not the whisper recognition (and gets the permission error when I replace the old code in the recognize_whisper() method).

But if I try the same with just whisper (see https://github.com/openai/whisper), this works:

import whisper

model = whisper.load_model("base")
result = model.transcribe("test.wav")
print(result["text"])