Differentiate between local max as part of peak and absolute max of peak

I have taken amplitude data from a 10-second clip of an mp3. I then performed a Fast-Fourier-Transform on it to get the data for the clip in the frequency domain (Shown in the first figure). I would now like to determine what frequencies the peaks are located at.

Amplitude to Frequency

I started by smoothing the data, which can be seen below in the blue and red plots. I created a threshold that the peaks must be over in order to be considered. This is the horizontal blue line on the third plot below. As can be seen, my peak detection code worked, to an extent.

Smoothing and peak detection

The problem that I am having now is evident in the final plot shown below. My code is finding maxima that are local maxima as part of the overall peak. I need a way to filter out these local maxima so that for each peak, I am only getting a single marker. i.e. for the peak shown below I only want a marker at the absolute peak, not at each minor peak along the way.

Enlarged view of peak detection

My peak detection code is shown below:

for i, item in enumerate(xavg): #xavg contains all the smoothed data points
    if xavg[i] > threshold: #points must be above the threshold
        #if not the first or last point (so index isn't out of range)            
        if (i > 0) and (i < (len(xavg)-1)): 
            #greater than points on either side                
            if (xavg[i] > xavg[i-1]) and (xavg[i] > xavg[i+1]):  
                max_locations.append(i)

EDIT: I think I didn't state my problem clearly enough. I want to find the locations of the 5 or so highest spikes on the plot, not just the highest point overall. I am basically trying to give the clip an audio fingerprint by marking its dominant frequencies.

EDIT2: Some more code to help show what I'm doing with regards to the FFT and smoothing:

def movingaverage(interval, window_size):
    window = np.ones(int(window_size))/float(window_size)
    return np.convolve(interval, window, 'same')

fft = np.fft.rfft(song)
xavg = movingaverage(abs(fft), 21)

Solution

Peak finding is pretty tricky, I would avoid trying to implement your own code if possible. Try using scipy.signal.find_peaks_cwt, there are a few parameters you can play around with. With this function I think you don't need to smooth the data before hand, since one of the parameters is basically a list of lengths over which to smooth the data. Roughly speaking the algorithm smooths the data on one length scale, looks for peaks, smooths on another length scale, looks for peaks, etc.. then it looks for peaks that appear at all or most length scales.