OpenAI Whisper allows me to use cpu device on the command line, but forces cuda in interpreter and fails

I can successfully use the whisper cli to transcribe an audio wav file. I use the command:

whisper --language en --model tiny --device cpu .tmp/audio/chunk1.wav

Located here, and using python 3.11:

dev@host ~/Development $ whereis whisper
whisper: /home/dev/Development/whispervm/.direnv/python-3.11/bin/whisper

Then I create a script that in theory should do the exact same thing, but it recognizes my nvidia card, attempts to use cuda and fails even when I explicitly state I want to use the "cpu" device.

#!/usr/bin/env python

import whisper

# whisper has multiple models that you can load as per size and requirements
model = whisper.load_model("tiny").to("cpu")

# path to the audio file you want to transcribe
PATH = ".tmp/audio/chunk1.wav"

result = model.transcribe(PATH, fp16=False)
print(result["text"])

Output is this:

   Found GPU0 Quadro K4000 which is of cuda capability 3.0.
    PyTorch no longer supports this GPU because it is too old.
    The minimum cuda capability supported by this library is 3.7.
    
  warnings.warn(old_gpu_warn % (d, name, major, minor, min_arch // 10, min_arch % 10))
Traceback (most recent call last):
  File "/home/dev/Development/whisper/test.py", line 2, in <module>
    model = whisper.load_model("tiny").to("cpu")

            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/whisper/__init__.py", line 149, in load_model
    model.load_state_dict(checkpoint["model_state_dict"])
  File "/home/dev/Development/whispervm/.direnv/python-3.11/lib/python3.11/site-packages/torch/nn/modules/module.py", line 2041, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for Whisper:
        While copying the parameter named "encoder.blocks.0.attn.query.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).
        While copying the parameter named "encoder.blocks.0.attn.key.weight", whose dimensions in the model are torch.Size([384, 384]) and whose dimensions in the checkpoint are torch.Size([384, 384]), an exception occurred : ('CUDA error: no kernel image is available for execution on the device\nCUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.\nFor debugging consider passing CUDA_LAUNCH_BLOCKING=1.\nCompile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.\n',).

and many more parameter errors. It does not transcribe. I'm thinking this might be a bug.

tl;dr: Whisper will not transcribe on the cpu as a python script when it does on the cli

Edit: installed pip packages list

Package                  Version
------------------------ ----------
bcrypt                   4.0.1
certifi                  2023.7.22
cffi                     1.16.0
charset-normalizer       3.3.0
cmake                    3.27.6
cryptography             41.0.4
decorator                5.1.1
Deprecated               1.2.14
fabric                   3.2.2
filelock                 3.12.4
idna                     3.4
invoke                   2.2.0
Jinja2                   3.1.2
lit                      17.0.2
llvmlite                 0.41.0
MarkupSafe               2.1.3
more-itertools           10.1.0
mpmath                   1.3.0
networkx                 3.1
numba                    0.58.0
numpy                    1.25.2
nvidia-cublas-cu11       11.10.3.66
nvidia-cuda-cupti-cu11   11.7.101
nvidia-cuda-nvrtc-cu11   11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11        8.5.0.96
nvidia-cufft-cu11        10.9.0.58
nvidia-curand-cu11       10.2.10.91
nvidia-cusolver-cu11     11.4.0.1
nvidia-cusparse-cu11     11.7.4.91
nvidia-nccl-cu11         2.14.3
nvidia-nvtx-cu11         11.7.91
openai-whisper           20230918
paramiko                 3.3.1
pip                      23.2.1
pycparser                2.21
pydub                    0.25.1
PyNaCl                   1.5.0
regex                    2023.10.3
requests                 2.31.0
setuptools               68.1.2
sympy                    1.12
tiktoken                 0.3.3
torch                    2.0.1
tqdm                     4.66.1
triton                   2.0.0
typing_extensions        4.8.0
urllib3                  2.0.6
wheel                    0.41.2
wrapt                    1.15.0

Solution

Found the error.

I looked at the source code and it seems that I need to pass the device in the load_model() function call as opposed to what I was reading on blogs. So the correct script looks like this:

import whisper

audio_file = "/home/dev/Development/whispervm/.tmp/audio/chunk1.wav"
audio = whisper.load_audio(audio_file)

model = whisper.load_model("tiny", device='cpu')
result = model.transcribe(audio)

print(result["text"])

I read that if you don't specify the device, it's supposed to default to cpu. If cuda can and is detected, it then defaults to cuda and when your card is too old for the more recent versions, it fails.