Search code examples
pythonpython-3.xaudiowav

Trying to get the frequencies of a .wav file in Python


What I'm trying to do seems simple: I want to know exactly what frequencies there are in a .wav file at given times; i.e. "from the time n milliseconds to n + 10 milliseconds, the average frequency of the sound was x hertz". I have seen people talking about Fourier transforms and Goertzel algorithms, as well as various modules, that I can't seem to figure out how to get to do what I've described.

What I'm looking for is a solution like this pseudocode, or at least one that will do something like what the pseudocode is getting at:

import some_module_that_can_help_me_do_this as freq

file = 'output.wav'
start_time = 1000  # Start 1000 milliseconds into the file
end_time = 1010  # End 10 milliseconds thereafter

print("Average frequency = " + str(freq.average(start_time, end_time)) + " hz")

I don't come from a mathematics background, so I don't want to have to understand the implementation details.


Solution

  • If you'd like to detect pitch of a sound (and it seems you do), then in terms of Python libraries your best bet is aubio. Please consult this example for implementation.

    import sys
    from aubio import source, pitch
    
    win_s = 4096
    hop_s = 512 
    
    s = source(your_file, samplerate, hop_s)
    samplerate = s.samplerate
    
    tolerance = 0.8
    
    pitch_o = pitch("yin", win_s, hop_s, samplerate)
    pitch_o.set_unit("midi")
    pitch_o.set_tolerance(tolerance)
    
    pitches = []
    confidences = []
    
    total_frames = 0
    while True:
        samples, read = s()
        pitch = pitch_o(samples)[0]
        pitches += [pitch]
        confidence = pitch_o.get_confidence()
        confidences += [confidence]
        total_frames += read
        if read < hop_s: break
    
    print("Average frequency = " + str(np.array(pitches).mean()) + " hz")
    

    Be sure to check docs on pitch detection methods.

    I also thought you might be interested in estimation of mean frequency and some other audio parameters without using any special libraries. Let's just use numpy! This should give you much better insight into how such audio features can be calculated. It's based off specprop from seewave package. Check docs for meaning of computed features.

    import numpy as np
    
    def spectral_properties(y: np.ndarray, fs: int) -> dict:
        spec = np.abs(np.fft.rfft(y))
        freq = np.fft.rfftfreq(len(y), d=1 / fs)
        spec = np.abs(spec)
        amp = spec / spec.sum()
        mean = (freq * amp).sum()
        sd = np.sqrt(np.sum(amp * ((freq - mean) ** 2)))
        amp_cumsum = np.cumsum(amp)
        median = freq[len(amp_cumsum[amp_cumsum <= 0.5]) + 1]
        mode = freq[amp.argmax()]
        Q25 = freq[len(amp_cumsum[amp_cumsum <= 0.25]) + 1]
        Q75 = freq[len(amp_cumsum[amp_cumsum <= 0.75]) + 1]
        IQR = Q75 - Q25
        z = amp - amp.mean()
        w = amp.std()
        skew = ((z ** 3).sum() / (len(spec) - 1)) / w ** 3
        kurt = ((z ** 4).sum() / (len(spec) - 1)) / w ** 4
    
        result_d = {
            'mean': mean,
            'sd': sd,
            'median': median,
            'mode': mode,
            'Q25': Q25,
            'Q75': Q75,
            'IQR': IQR,
            'skew': skew,
            'kurt': kurt
        }
    
        return result_d