python numpy signal-processing fft spectrum

spectrum analyzer of wave files with numpy.rfft

I'm writing a script to process a wave file in Python and display a spectrum analyzer, just for nice visualization of audio files. After some reading of the doc and forums I assumed I needed to use rfft.

I'm processing samples of 2048 values, creating 1024 bands in output of rfft. The thing is that for my needs I would need to reduce the number of bands dramatically to 12 bands (1 octave). Since I'm processing audio files and have a limited number of bands I wonder if there is a smart way to group frequencies so that 90% of songs look nice with low-pitched beats on the very left and high-pitched voices/shouts/notes on the very right.

With this preliminary code below I have more bands that what I need but also most peaks are concentrated in low frenquencies with most songs, except a test range from 20 to 20k. With this range I also realized that the higher the pitch is, the lower the amplitude is.

def fft(self, sample_range):
    # sample_range is a sample of 2048 ints read from the self.file wave file
    fft_data = abs(numpy.fft.rfft(sample_range)) # real fft gives samplewidth/2 bands
    fft_freq = numpy.fft.rfftfreq(len(sample_range))
    freq_hz = [abs(fft_freq[i])*self.file.getframerate() for i, fft in enumerate(fft_data)]

    print len(zip(freq_hz, fft_data)), len(freq_hz), len(fft_data), zip(freq_hz, fft_data)

Here is the print output for the first sample of the rampe (~20Hz):

1025 1025 1025 [(0.0, 1850501.0), (21.533203125, 2779524.1730200453), (43.06640625, 15469093.29481476), ... (22028.466796875, 3538.1225240980043), (22050.0, 3553.0)]

So my questions are:

Am I doing something that I shouldn't in the code hereabove? =)
What units spectrum analyzers in most music players represent usually and what are the ranges? Should I convert amplitudes into dB?
Is there a simple way to reduce the number of bands to 12? I guess the bandwidth is exponential with the pitch? I would say that I need to manually implement this exponential sum.

EDIT: I'm now summing fft frequencies using a reference log-scale that I generate for an arbitrary number of bands with:

In [22]: num_bands = 10
In [23]: [44100*2**(b-num_bands) for b in range(num_bands)]
Out[23]: [43.06640625,  86.1328125,  172.265625,  344.53125,  689.0625,  1378.125,  2756.25,  5512.5,  11025.0,  22050.0]

In [24]: num_bands = 12
In [25]: [44100*2**(b-num_bands) for b in range(num_bands)]
Out[25]: [10.7666015625,  21.533203125,  43.06640625,  86.1328125,  172.265625,  344.53125,  689.0625,  1378.125,  2756.25,  5512.5,  11025.0,  22050.0]

I use these as the maximum frequencies for each band. It works until num_bands = 10 maximum. From 11 and more I start getting very low frequencies out of audible range. Any idea to shrink the range better than this? Maximum frequency of the first band should be at least 40 Hz in any case.

Solution

Yes, spectrum displays very often convert to dB (or other log scale).

The simplest way to reduce the number of bands is to just add adjacent FFT result bins together in groups per octave (or per half or 12th octave, etc.) of roughly equal ratio between the highest and lowest frequency represented by each band or group of FFT result bins. Make the ratio-sized groups big or small enough so that you end up with the desired number of total bands.