java audio fft sampling frequency-analysis

Getting frequencies and magnitudes from audio sample using FFT

I am currently working on a music visualization LED strip display (similar to this) but got stuck at the process of extracting audio frequency / magnitude from the sample data.

The entire process takes up quite a few steps, and I am not sure at which point I am failing exactly, so please bear with me. I start with reading audio from a TargetDataLine configured with AudioFormat.

AudioSystem.getTargetDataLine(new AudioFormat(8000.0f, 16, 1, true, true));

Next, I read the data as described in this tutorial. Only difference is that I am using a smaller buffer, effectively reading 64 bytes at a time.

Here is an example data read in this way from a 500Hz test tone.

final byte[] buffer = { // 32 16-bit samples
        30, 111, 43, -19, 50, -74, 49, -54,
        41, 70, 26, 121, 7, -90, -13, -89,
        -31, -118, -44, 17, -51, 75, -50, 60,
        -42, -62, -27, -115, -8, 91, 12, 85,
        30, 106, 43, -34, 50, -93, 49, -76,
        41, 52, 26, 108, 7, -93, -13, -84,
        -31, -104, -44, 38, -51, 98, -50, 84,
        -42, -42, -27, -98, -8, 99, 12, 83
};

Next, I transform the sampled bytes into an array of double values in range [-1, 1]. I took inspiration from the aforementioned tutorial and this code.

final double[] samples = new double[buffer.length];
int sampleIndex = 0;
int byteIndex = 0;
while (byteIndex < buffer.length) {
    int low = buffer[byteIndex];
    ++byteIndex;
    int high = buffer[byteIndex];
    ++byteIndex;

    int sampleIntValue = ((high << 8) + (low & 0x00ff));
    double maxSampleValue = 32768;
    samples[2 * sampleIndex] = ((double) sampleIntValue) / maxSampleValue; // Transforming to [-1, 1]
    samples[(2 * sampleIndex) + 1] = 0; // Imaginary part
    ++sampleIndex;
}

Next, I perform a Fast Fourier Transform on the data using the edu.emory.mathcs:JTransforms:2.4 library.

final DoubleFFT_1D fft = new DoubleFFT_1D(samples.length / 2);
fft.complexForward(samples);

Then, I calculate frequencies and magnitudes as described in this code.

final float sampleRate = 8000f;
final Map<Double, Double> frequenciesToMagnitudes = IntStream.range(0, samples.length / 2)
        .boxed()
        .collect(Collectors.toMap(
                index -> 2 * ((double) index / (double) samples.length) * sampleRate,
                index -> Math.log10(
                        Math.sqrt(
                                Math.pow(samples[2 * index], 2) // real part
                                        + Math.pow(samples[(2 * index) + 1], 2) // imaginary part
                        )
                )
        ));

I find the maximum magnitude so I can scale the displayed values accordingly.

final double maximumMagnitude = frequenciesToMagnitudes.values().stream().max(Double::compare).orElse(0d);

And finally, I display the results (taller block represents a brighter LED).

final char[] magnitudeDisplay = {'▁', '▂', '▃', '▄', '▅', '▆', '▇', '█'};
frequenciesToMagnitudes.entrySet().stream().sorted(Map.Entry.comparingByKey()).forEach(frequencyToMagnitude -> {
    final double magnitudePercentage = frequencyToMagnitude.getValue() / maximumMagnitude;
    final int characterToDisplayIndex = Math.min(magnitudeDisplay.length - 1, Math.max(0, (int) (magnitudePercentage * magnitudeDisplay.length)));
    final char characterToDisplay = magnitudeDisplay[characterToDisplayIndex];
    System.out.print(characterToDisplay);
});
System.out.println();

From all this, I would expect to see two distinct spikes (my frequency + alias region), but instead there is 8.

▁▁▅▁▁▁█▁▁▁▄▁▁▁▅▁▁▁▅▁▁▁▄▁▁▁█▁▁▁▅▁

The number of spikes changes depending on frequency of audio playback (sometimes it's two, sometimes it's four), and the lower the frequency the more spikes I see.

My question is: how do I extract frequency / magnitude pairs from audio data?

Solution

After a bit of digging I have found the OpenIMAJ library which does exactly what I need. Here are the differences in my code and their code.

Transforming from bytes to floating point is incorrect. All examples use big endian audio format, but take the bytes in the wrong order. It should be:

int high = buffer[byteIndex];
++byteIndex;
int low = buffer[byteIndex];
++byteIndex;

Size of FFT output is padded to next power of 2. This was not a problem since I always chose a buffer of size that was a power of 2, but in case if your buffer is not, this will be necessary.

final int fftSize = (int) Math.pow(2, 32 - Integer.numberOfLeadingZeros(numberOfSamples - 1));

The values are transformed to floating point a bit differently.

samples[2 * sampleIndex] = (float) sampleIntValue * Integer.MAX_VALUE / Short.MAX_VALUE;

Real parts are normalized.

for (int i = 0; i < samples.length; i += 2) samples[i] /= fftSize;

No logarithms are used for fetching magnitudes.

final Map<Float, Float> frequenciesToMagnitudes = IntStream.range(0, samples.length / 4)
        .boxed()
        .collect(Collectors.toMap(
                index -> (((float) (index * 2)) * (sampleRate / (float) numberOfSamples)) / 2,
                index -> {
                    final float realPart = samples[index * 2];
                    final float imaginaryPart = samples[index * 2 + 1];
                    return (float) Math.sqrt(realPart * realPart + imaginaryPart * imaginaryPart);
                }
        ));

After all the changes, the output appears to be much more reasonable:

▁▁▂▁▁▁█▁▁▁▂▁▁▁▂▁

I am still not entirely sure if the frequency calculation is correct, though.