Search code examples
pythonmachine-learninglibrosawaveform

Facing problem in resampling audio in librosa


I am trying to fine tune wav2vec2 model with my dataset. For this reason I loaded audios. Now want to downsample them to 16kHz. But librosa.reshape function is giving an error which I couldn't resolve. The error message is:

resample() takes 1 positional argument but 3 were given

Firstly, I tried to load it with librosa with sampling rate 16kHz. But as I have less experience in this field, and I'm facing problem in the later part of my project because of this. I found a code which supposed to resample the audio signal. I tried to use it, but faced the above mentioned problem.

This part works fine:

database={}
audios = []
psr = []
for path in df['audio']:
  speech_array,sr = torchaudio.load(path)
  audios.append(speech_array[0].numpy())
  psr.append(sr)
database['audio'] = audios
database['psr'] = psr

And I get an error for every index:

import librosa
import numpy as np

# Assuming 'database' is your DataFrame containing 'audio' and 'psr' columns

# List to store new sampling rates
new_sr = []

# Resample each audio signal and store the new sampling rate
for i in range(len(database['psr'])):
    try:
        audio_signal = np.asarray(database['audio'][i])  # Convert audio to numpy array
        original_sr = database['psr'][i]  # Original sampling rate

        # Check if the audio signal is mono (single-channel)
        if audio_signal.ndim == 1:
            # Resample mono audio signal
            resampled_audio = librosa.resample(audio_signal, original_sr, 16000)
        else:
            # Resample each channel separately for multi-channel audio
            resampled_channels = []
            for channel in audio_signal:
                resampled_channel = librosa.resample(channel, original_sr, 16000)
                resampled_channels.append(resampled_channel)
            resampled_audio = np.array(resampled_channels)

        # Store resampled audio back in DataFrame
        database['audio'][i] = resampled_audio

        # Store new sampling rate (16000 Hz)
        new_sr.append(16000)
    except Exception as e:
        print(f"Error processing audio at index {i}: {e}")

# Add new sampling rates to the DataFrame
database['newsr'] = new_sr

Solution

  • Here is the definition of reshape[src] :

    @cache(level=20)
    def resample(
        y: np.ndarray,
        *,  # forces you to pass all the following arguments only as named ones
        orig_sr: float,
        target_sr: float,
        res_type: str = "soxr_hq",
        fix: bool = True,
        scale: bool = False,
        axis: int = -1,
        **kwargs: Any,
    ) -> np.ndarray:
    

    Docs also provide an example of doing so:

    y, sr = librosa.load(librosa.ex('trumpet'), sr=22050)
    y_8k = librosa.resample(y, orig_sr=sr, target_sr=8000)
    

    So in your case resample calls should be:

    # Resample mono audio signal
    resampled_audio = librosa.resample(audio_signal, 
                                       orig_sr=original_sr,
                                       target_sr=16000)
    ...
    
    # Resample each channel separately for multi-channel audio
    resampled_channel = librosa.resample(channel, 
                                         orig_sr=original_sr, 
                                         target_sr=16000)