Search code examples
audiospeech-to-textopenai-whisper

How do I run Whisper on an entire directory?


I'd like to transcribe speech to text using Whisper. I have been able to successfully run it on a single file using the command:

whisper audio.wav

I'd like to run it on a large number of files in a single director called "Audio" on my desktop. I tried to write this into Python as follows:

import whisper
import os

model = whisper.load_model("base")

for filename in os.listdir('Audio'):   
    model.transcribe(filename)   

It appears to start, but then gives me some errors about "No such file or directory." Is there some way I can correct this to run Whisper on all the .wav files in my Audio directory?

Error:

/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py:78: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")
Traceback (most recent call last):
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 42, in load_audio
    ffmpeg.input(file, threads=0)
  File "/opt/homebrew/lib/python3.10/site-packages/ffmpeg/_run.py", line 325, in run
    raise Error('ffmpeg', out, err)
ffmpeg._run.Error: ffmpeg error (see stderr output for detail)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/user/Desktop/transcribe.py", line 7, in <module>
    model.transcribe(filename)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/transcribe.py", line 84, in transcribe
    mel = log_mel_spectrogram(audio)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 111, in log_mel_spectrogram
    audio = load_audio(audio)
  File "/opt/homebrew/lib/python3.10/site-packages/whisper/audio.py", line 47, in load_audio
    raise RuntimeError(f"Failed to load audio: {e.stderr.decode()}") from e
RuntimeError: Failed to load audio: ffmpeg version 5.1.2 Copyright (c) 2000-2022 the FFmpeg developers
  built with Apple clang version 14.0.0 (clang-1400.0.29.202)
  configuration: --prefix=/opt/homebrew/Cellar/ffmpeg/5.1.2_1 --enable-shared --enable-pthreads --enable-version3 --cc=clang --host-cflags= --host-ldflags= --enable-ffplay --enable-gnutls --enable-gpl --enable-libaom --enable-libbluray --enable-libdav1d --enable-libmp3lame --enable-libopus --enable-librav1e --enable-librist --enable-librubberband --enable-libsnappy --enable-libsrt --enable-libtesseract --enable-libtheora --enable-libvidstab --enable-libvmaf --enable-libvorbis --enable-libvpx --enable-libwebp --enable-libx264 --enable-libx265 --enable-libxml2 --enable-libxvid --enable-lzma --enable-libfontconfig --enable-libfreetype --enable-frei0r --enable-libass --enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg --enable-libspeex --enable-libsoxr --enable-libzmq --enable-libzimg --disable-libjack --disable-indev=jack --enable-videotoolbox --enable-neon
  libavutil      57. 28.100 / 57. 28.100
  libavcodec     59. 37.100 / 59. 37.100
  libavformat    59. 27.100 / 59. 27.100
  libavdevice    59.  7.100 / 59.  7.100
  libavfilter     8. 44.100 /  8. 44.100
  libswscale      6.  7.100 /  6.  7.100
  libswresample   4.  7.100 /  4.  7.100
  libpostproc    56.  6.100 / 56.  6.100
221211_1834.wav: No such file or directory

Solution

  • Here's an option for you. It does the following:

    1 - Finds all .wav files in the "root folder" & sub-folders. You need to change this to your "Audio" folder location.

    2 - Shows progress bar as it's transcribing the files (done using tqdm).

    3 - Saves a .txt file containing the transcription next to the .wav files.

    CODE:

    import os
    import whisper
    from tqdm import tqdm
    
    # Define the folder where the wav files are located
    root_folder = "/Users/downloads"
    
    # Set up Whisper client
    print("Loading whisper model...")
    model = whisper.load_model("base")
    print("Whisper model complete.")
    
    # Get the number of wav files in the root folder and its sub-folders
    print("Getting number of files to transcribe...")
    num_files = sum(1 for dirpath, dirnames, filenames in os.walk(root_folder) for filename in filenames if filename.endswith(".wav"))
    print("Number of files: ", num_files)
    
    # Transcribe the wav files and display a progress bar
    with tqdm(total=num_files, desc="Transcribing Files") as pbar:
        for dirpath, dirnames, filenames in os.walk(root_folder):
            for filename in filenames:
                if filename.endswith(".wav"):
                    filepath = os.path.join(dirpath, filename)
                    result = model.transcribe(filepath, fp16=False, verbose=True)
                    transcription = result['text']
                    # Write transcription to text file
                    filename_no_ext = os.path.splitext(filename)[0]
                    with open(os.path.join(dirpath, filename_no_ext + '.txt'), 'w') as f:
                        f.write(transcription)
                    pbar.update(1)