Search code examples
pythonopenai-apiopenai-whisper

Can Distilled Whisper Models be used as a Drop-In Replacement for OpenAI Whisper?


I have a working video transcription pipeline working using a local OpenAI Whisper model. I would like to use the equivalent distilled model ("distil-small.en"), which is smaller and faster.

transcribe(self):
    file = "/path/to/video"

    model = whisper.load_model("small.en")          # WORKS
    model = whisper.load_model("distil-small.en")   # DOES NOT WORK 

    transcript = model.transcribe(word_timestamps=True, audio=file)
    print(transcript["text"])

However, I get an error that the model was not found:

RuntimeError: Model distil-small.en not found; available models = ['tiny.en', 'tiny', 'base.en', 'base', 'small.en', 'small', 'medium.en', 'medium', 'large-v1', 'large-v2', 'large-v3', 'large']

I installed my dependencies in Poetry (which used pip under the hood) as follows:

[tool.poetry.dependencies]
python = "^3.11"
openai-whisper = "*"
transformers  = "*" # distilled whisper models
accelerate  = "*" # distilled whisper models
datasets = { version = "*", extras = ["audio"] } # distilled whisper models

The GitHub Distilled Whisper documentation appears to use a different approach to installing and using these models.

Is it possible to use a Distilled model as a drop-in replacement for a regular Whisper model?


Solution

  • load_model with a string parameter will only work with OpenAI's known list of models. If you want to use your own model, you will need to download it from the huggingface hub or elsewhere first. See: https://huggingface.co/distil-whisper/distil-small.en#running-whisper-in-openai-whisper

    import torch
    from datasets import load_dataset
    from huggingface_hub import hf_hub_download
    from whisper import load_model, transcribe
    
    distil_small_en = hf_hub_download(repo_id="distil-whisper/distil-small.en", filename="original-model.bin")
    model = load_model(distil_small_en)
    
    dataset = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation")
    sample = dataset[0]["audio"]["array"]
    sample = torch.from_numpy(sample).float()
    
    pred_out = transcribe(model, audio=sample)
    print(pred_out["text"])
    
    

    You can also see where OpenAI checks the string parameter of load_model that it only checks the known models (as described in the error you showed)