python pytorch google-colaboratory large-language-model openai-whisper

RuntimeError: Library libcublas.so.11 is not found or cannot be loaded

I am working on an LLM project on google colab using V100 GPU, High-RAM mode, and these are my dependencies:

git+https://github.com/pyannote/pyannote-audio
git+https://github.com/huggingface/transformers.git@v4.34.1
openai==0.28
ffmpeg-python
pandas==1.5.0
tokenizers==0.14
torch==2.1.1
torchaudio==2.1.1
tqdm==4.64.1
EasyNMT==2.0.2
psutil==5.9.2
requests
pydub
docxtpl
faster-whisper==0.10.0
git+https://github.com/openai/whisper.git

Here is everything I import:

from faster_whisper import WhisperModel
from datetime import datetime, timedelta
from time import time
from pathlib import Path
import pandas as pd
import os
from pydub import AudioSegment
import numpy as np
from sklearn.cluster import AgglomerativeClustering
from sklearn.metrics import silhouette_score

import requests

import torch
import pyannote.audio
from pyannote.audio.pipelines.speaker_verification import PretrainedSpeakerEmbedding
from pyannote.audio import Audio
from pyannote.core import Segment

import wave
import contextlib
import psutil

import openai
from codecs import decode

from docxtpl import DocxTemplate

I used to use torch and torchaudio in their latest versions but they got an update yesterday (15 December 2023, v2.1.2 got released). I assumed that the error I was getting was caused by the update so I pinned them to the version that my code was working in (v2.1.1) 2 days ago. Obviously, it did not work.

Everything was working 2 days ago and I didn't change anything in my notebook. The only thing that may have changed is the dependencies I was using but using the prior versions did not fix my problem. Here is the code snippet that throws the error:

def EETDT(audio_path, whisper_model, num_speakers, output_name="diarization_result", selected_source_lang="eng", transcript=None):
    """
    Uses Whisper to seperate audio into segments and generate transcripts.
segment.

    Speech Recognition is based on models from OpenAI Whisper https://github.com/openai/whisper
    Speaker diarization model and pipeline from by https://github.com/pyannote/pyannote-audio

    audio_path : str -> path to wav file
    whisper_model : str -> small/medium/large/large-v2/large-v3
    num_speakers : int -> number of speakers in audio (0 to let the function determine it)
    output_name : str -> Desired name of the output file
    selected_source_lang : str -> language's code
    """

    audio_name = audio_path.split("/")[-1].split(".")[0]

    model = WhisperModel(whisper_model, compute_type="int8")
    time_start = time()
    if(audio_path == None):
        raise ValueError("Error no video input")
    print("Input file:", audio_path)
    if not audio_path.endswith(".wav"):
        print("Submitted audio isn't in wav format. Starting conversion...")
        audio = AudioSegment.from_file(audio_path)
        audio_suffix = audio_path.split(".")[-1]
        new_path = audio_path.replace(audio_suffix,"wav")
        audio.export(new_path, format="wav")
        audio_path = new_path
        print("Converted to wav:", new_path)
    try:
        # Get duration
        with contextlib.closing(wave.open(audio_path,'r')) as f:
            frames = f.getnframes()
            rate = f.getframerate()
            duration = frames / float(rate)
        if duration<30:
            raise ValueError(f"Audio has to be longer than 30 seconds. Current: {duration}")
        print(f"Duration of audio file: {duration}")

        # Transcribe audio
        options = dict(language=selected_source_lang, beam_size=5, best_of=5)
        transcribe_options = dict(task="transcribe", **options)
        segments_raw, info = model.transcribe(audio_path, **transcribe_options)

        # Convert back to original openai format
        segments = []
        i = 0
        full_transcript = list()
        if type(transcript) != type(pd.DataFrame()):
            for segment_chunk in segments_raw: # <-- THROWS ERROR
                chunk = {}
                chunk["start"] = segment_chunk.start
                chunk["end"] = segment_chunk.end
                chunk["text"] = segment_chunk.text
                full_transcript.append(segment_chunk.text)
                segments.append(chunk)
                i += 1
            full_transcript = "".join(full_transcript)
            print("Transcribe audio done with fast-whisper")
        else:
            for i in range(len(transcript)):
                full_transcript.append(transcript["text"].iloc[i])
            full_transcript = "".join(full_transcript)
            print("You inputted pre-transcribed audio")

    except Exception as e:
        raise RuntimeError("Error converting video to audio")
 ...The code never leaves the try block...

Solution

use

!apt install libcublas11

to solve this problem I use this and its worked