Search code examples
rubyaudiomp3fftwav

Extract Fast Fourier Transform data from file


I am building a tool which is supposed to run on a server and analyze sound files. I want to do this in Ruby as all my other tools are written in Ruby as well. But I am having trouble finding a good way of accomplishing this.

A lot of the examples I've found has been doing visualizers and graphical stuff. I just need the FFT data, nothing more. I need to both get the audio data, and do a FFT on it. My end goal is to calculate some stuff like the mean/median/mode, 25th-percentile, and 75th-percentile over all frequencies (weighted amplitude), the BPM, and perhaps some other good characteristic to later be able to cluster similar sounds together.

First I tried to use ruby-audio and fftw3 but I never go the two to really work together. The documentation wasn't good either so I really didn't know what data was being shuffled around. Next I tried to use bplay / brec and limit my Ruby script to just use STDIN and perform an FFT on that (still using fftw3). But I couldn't get bplay/brec to work since the server doesn't have a sound card and I didn't manage to just get the audio directly to STDOUT without going to an audio device first.

Here's the closest I've gotten:

# extracting audio from wav with ruby-audio
buf = RubyAudio::Buffer.float(1024)
RubyAudio::Sound.open(fname) do |snd|
    while snd.read(buf) != 0
        # ???
    end
end

# performing FFT on audio
def get_fft(input, window_size)
    data = input.read(window_size).unpack("s*")
    na = NArray.to_na(data)
    fft = FFTW3.fft(na).to_a[0, window_size/2]
    return fft
end

So now I'm stuck and can't find any more good results on Google. So perhaps you SO guys can help me out?

Thanks!


Solution

  • Here's the final solution to what I was trying to achieve, thanks a lot to Randall Cook's helpful advice. The code to extract sound wave and FFT of a wav file in Ruby:

    require "ruby-audio"
    require "fftw3"
    
    fname = ARGV[0]
    window_size = 1024
    wave = Array.new
    fft = Array.new(window_size/2,[])
    
    begin
        buf = RubyAudio::Buffer.float(window_size)
        RubyAudio::Sound.open(fname) do |snd|
            while snd.read(buf) != 0
                wave.concat(buf.to_a)
                na = NArray.to_na(buf.to_a)
                fft_slice = FFTW3.fft(na).to_a[0, window_size/2]
                j=0
                fft_slice.each { |x| fft[j] << x; j+=1 }
            end
        end
    
    rescue => err
        log.error "error reading audio file: " + err
        exit
    end
    
    # now I can work on analyzing the "fft" and "wave" arrays...