Search code examples
pythonnumpyscipysignal-processingfft

Python: performing FFT on music file


I am trying to perform a FFT on a song (audio file in wav format, about 3 minutes long) which I created as follows, just in case it is relevant.

ffmpeg -i "$1" -vn -ab 128k -ar 44100 -y -ac 1 "${1%.webm}.wav"

Where $1 is the name of a webm file.

This is the code which is supposed to display a FFT of the given file:

import numpy as np
import matplotlib.pyplot as plt

# presume file already converted to wav.
file = os.path.join(temp_folder, file_name)

rate, aud_data = scipy.io.wavfile.read(file)

# wav file is mono.
channel_1 = aud_data[:]

fourier = np.fft.fft(channel_1)

plt.figure(1)
plt.plot(fourier)
plt.xlabel('n')
plt.ylabel('amplitude')
plt.show()

The problem is, it takes for ever. It takes so long that I cannot even show the output, since I've had plenty of time to research and write this post and it still has not finished.

I presume that the file is too long, since

print (aud_data.shape)

outputs (9218368,), but this looks like a real world problem, so I hope there is a way to obtain an FFT of an audio file somehow.

What am I doing wrong? Thank you.

edit

A better formulation of the question would be: is the FFT of any good in music processing? For example similarity of 2 pieces.

As pointed out in the comments, my plain approach is way too slow.

Thank you.


Solution

  • To considerably speed up the fft portion of your analysis, you can zero-pad out your data to a power of 2:

    import scipy
    import numpy as np
    import matplotlib.pyplot as plt
    
    # rate, aud_data = scipy.io.wavfile.read(file)
    rate, aud_data = 44000, np.random.random((9218368,))
    
    len_data = len(aud_data)
    
    channel_1 = np.zeros([2**(int(np.ceil(np.log2(len_data)))),1])
    channel_1[0:len_data] = aud_data
    
    fourier = np.fft.fft(channel_1)
    

    Here is an example of plotting the real component of the fourier transform of a few sine waves using the above method:

    import numpy as np
    import matplotlib.pyplot as plt
    
    # rate, aud_data = scipy.io.wavfile.read(file)
    rate = 44000
    ii = np.arange(0, 9218368)
    t = ii / rate
    aud_data = np.zeros(len(t))
    for w in [1000, 5000, 10000, 15000]:
        aud_data += np.cos(2 * np.pi * w * t)
    
    # From here down, everything else can be the same
    len_data = len(aud_data)
    
    channel_1 = np.zeros(2**(int(np.ceil(np.log2(len_data)))))
    channel_1[0:len_data] = aud_data
    
    fourier = np.fft.fft(channel_1)
    w = np.linspace(0, 44000, len(fourier))
    
    # First half is the real component, second half is imaginary
    fourier_to_plot = fourier[0:len(fourier)//2]
    w = w[0:len(fourier)//2]
    
    plt.figure(1)
    
    plt.plot(w, fourier_to_plot)
    plt.xlabel('frequency')
    plt.ylabel('amplitude')
    plt.show()