Search code examples
rustffttauri

How does FFT work? Can't seem to the frequency spectrum for each frame in Rust


I am only getting 735 frequencies of audio data each 0.03 seconds.

I am trying to figure how how I can get the frequency spectrum in each frame. The code below only returns 735 of data each frame (because samples_in_frame is 735, that's how many samples in each frame), but I want the whole ~20,000hz for 0.03 seconds of samples. How would I go about doing this?

path: path to the wave file

second_start: start point to process in seconds, 0

second_length: how long to process in seconds, 10

frame_rate: how many frames in each second, 60

fn fft_this(mut buffer: Vec<Complex<f64>>, samples_in_frame: usize) -> Vec<f64> {
    let mut planner = FftPlanner::new();

    let fft = planner.plan_fft_forward(samples_in_frame);

    fft.process(&mut buffer[..]);

    let mut frame_data: Vec<f64> = Vec::with_capacity(samples_in_frame);

    for i in 0..samples_in_frame {
        frame_data.push(buffer[i].norm())
    }

    return frame_data;
}

#[tauri::command]
pub fn analyze(
    path: &str,
    second_start: f64,
    second_length: f64,
    frame_rate: f64,
) -> Vec<Vec<f64>> {
    let wave_file: Wave64 = Wave64::load(path).expect("Could not load wave.");

    let samples_start: f64 = second_start * wave_file.sample_rate();
    let samples_in_each_frame: usize = (wave_file.sample_rate() / frame_rate) as usize;

    let frames_in_length: usize = (second_length * frame_rate) as usize;

    let mut wave_buffer: Vec<Vec<f64>> = vec![];

    for frame_index in 0..frames_in_length {
        let mut frame_buffer: Vec<Complex<f64>> = vec![];

        for sample_index in 0..samples_in_each_frame {
            let buffer_index: usize = frame_index * sample_index + samples_start as usize;

            frame_buffer.push(Complex {
                re: wave_file.at(0, buffer_index),
                im: 0.0,
            });
        }

        let processed_frame: Vec<f64> = fft_this(frame_buffer, samples_in_each_frame);

        wave_buffer.push(processed_frame);
    }

    return wave_buffer;
}

Solution

  • I think you need more background information about FFTs in general.

    If you have 735 data points, this data only consists of 735 orthogonal frequencies.

    Lets assume those 735 points represent 1 second. Then:

    • The first FFT value is the DC part, 0 Hz, and the average of all values.
    • The second one is 1 Hz, the slowest contained frequency, meaning one full cycle during the sampling period.
    • The next one is 2 Hz, meaning two full cycles during the sampling period.
    • ...
    • 367 Hz, meaning 367 full cycles
    • 368 Hz, meaning 368 full cycles. IMPORTANT: Due to aliasing (see "Sampling Theorem") this is identical to -367 Hz, meaning, 367 Hz rotating in the opposite direction!
    • -366 Hz
    • -365 Hz
    • ...
    • -1 Hz

    In total those are 735 frequencies. There isn't more information in the signal.

    In your case, as your time period is not 1 second but 0.03 seconds, you need to multiply your frequencies by 1/0.03 = 33.33. So you get:

    • 0 Hz
    • 33.33 Hz
    • 66.66 Hz
    • ...
    • 12200 Hz
    • 12233.33 Hz
    • -12233.33 Hz
    • -12200 Hz
    • ...
    • -66.66 Hz
    • -33.33 Hz

    There simply isn't more information in your samples.

    Additional important info:

    For the FFT, it looks like the signal you give to it is repeating endlessly. So if your 735 samples aren't actually repeating (which I guess they aren't), you need to apply a window function to reduce the artifacts you get from odd frequencies. For example, a clean 1.5 Hz signal in the 1 second case will give you some weird overtones because applying no window function is equivalent to applying a rectangular window function, which has horrible overtones. More info here.

    For visual learners, I can strongly recommend the videos of 3Blue1Brown about the fourier transform.