Search code examples
feature-extractionmfcclibrosa

MFCC feature extraction, Librosa


I want to extract mfcc features of an audio file sampled at 8000 Hz with the frame size of 20 ms and of 10 ms overlap. What must be the parameters for librosa.feature.mfcc() function. Does the code written below specify 20ms chunks with 10ms overlap?

import librosa as l

x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
mfccs = l.feature.mfcc(x, sr=sr, n_mfcc = 24, hop_length = 160)

The audio file is of 1800 seconds. Does that mean I would get 24 mfccs for all (1800/0.01)-1 chunks of the audio?


Solution

  • 1800 seconds at 8000 Hz are obviously 1800 * 8000 = 14400000 samples. If your hop length is 160, you get roughly 14400000 / 160 = 90000 MFCC values with 24 dimensions each. So this is clearly not (1800 / 0.01) - 1 = 179999 (off by a factor of roughly 2).

    Note that I used roughly in my calculation, because I only used the hop length and ignored the window length. Hop length is the number of samples the window is moved with each step. How many hops you can fit depends on whether you pad somehow or not. And if you decide not to pad, the number of frames also depends on your window size.

    To get back to your question: You have to ask yourself how many samples are 10 ms?

    If 1 s contains 8000 samples (that's what 8000 Hz means), how many samples are in 0.01 s? That's 8000 * 0.01 = 80 samples.

    This means you have a hop length of 80 samples and a window length of 160 samples (0.02 s—twice as long).

    Now you should tell librosa to use this info, like this:

    import librosa as l
    
    x, sr = l.load('/home/user/Data/Audio/Tracks/Dev/FS_P01_dev_001.wav', sr = 8000)
    n_fft = int(sr * 0.02)   # window length: 0.02 s
    hop_length = n_fft // 2  # usually one specifies the hop length as a fraction of the window length
    mfccs = l.feature.mfcc(x, sr=sr, n_mfcc=24, hop_length=hop_length, n_fft=n_fft)
    # check the dimensions
    print(mfccs.shape)
    

    Hope this helps.