Search code examples
pythonscipy

Finding the prominent peaks in data


I am trying to find the prominent peaks of some sets of data. One of my data sets looks like the following:

Data Set 1

As you can see in this there is a clear peak at approx w = -1.6. This is a prominent peak as there are no other peaks in the data set that are more than 10% of its height. So I store this w value as it is a prominent peak.

I also have another data set:

Data Set 2

Here you can see I have a large peak at approx w = 1.6. However, there is also a peak at approx w = 0.9 which has a Spectral Power that is larger than 10% of the largest peak. So I do not store this w value as it is not a prominent peak.

I am trying to create a function so that for a given set of Spectral power called y. The function checks if the largest peak is prominent (i.e there are no other peaks that are more than 10% the size of the largest peak). If largest peak is prominent then it prints the w value the peak occurs at. If it is not prominent then it prints("Not prominent").

from scipy.signal import find_peaks, peak_widths, peak_prominences

def check_prominent_peaks(y):
    peaks, _ = find_peaks(y)
        
    peak_heights = y[peaks]
        
    # Find the peak with the maximum height
    max_peak_index = np.argmax(peak_heights)
    max_peak_height = peak_heights[max_peak_index]

    threshold = 10

    # Check if the maximum peak 10 times larger than the others
    if max_peak_height > threshold * np.max(peak_heights[peak_heights != max_peak_height]):
        print(w[peaks][max_peak_index])
            
    else:
        print("Not Prominent")

However, this includes each individual data point as a peak. I want an individual peak to be the combination of the data points. Below is a snippet of the data.

w_data1 = np.array([-3.14159265e+00, -3.03687290e+00, -2.77507351e+00, -2.61799388e+00,
       -2.35619449e+00, -2.09439510e+00, -1.80641578e+00, -1.72787596e+00,
       -1.71740398e+00, -1.61268423e+00, -1.60744824e+00, -1.60221225e+00,
       -1.59697627e+00, -1.59174028e+00, -1.58650429e+00, -1.58126830e+00,
       -1.57079633e+00, -1.56556034e+00, -1.56032435e+00, -1.55508836e+00,
       -1.54985238e+00, -1.51843645e+00, -1.46607657e+00, -1.41371669e+00,
       -1.36135682e+00, -1.30899694e+00, -1.04719755e+00, -7.85398163e-01,
       -5.23598776e-01, -2.61799388e-01, -4.44089210e-16,  2.61799388e-01,
        5.23598776e-01,  1.04719755e+00,  1.57079633e+00,  2.09439510e+00,
        2.61799388e+00])

PS_data1 = array([1.95875121e+01, 2.17955953e+01, 3.02014494e+01, 3.86345651e+01,
       6.58105570e+01, 1.46458509e+02, 8.09563221e+02, 2.21754612e+04,
       6.40104090e+02, 5.20057432e+05, 1.28551555e+05, 1.25288212e+05,
       8.11800261e+04, 1.40052558e+06, 9.64441464e+05, 2.08724844e+06,
       9.40242662e+05, 2.61656013e+04, 1.32967749e+05, 1.22127952e+05,
       1.23733666e+05, 3.60411360e+03, 2.00470602e+04, 4.04207652e+03,
       1.10741091e+04, 5.54534985e+02, 5.58959484e+01, 5.58399196e+00,
       3.91053138e+01, 2.73404280e+01, 0.00000000e+00, 1.59311017e+01,
       2.25706295e+01, 1.01049830e+02, 2.22757819e+02, 1.27038938e+01,
       1.34606021e+01])

w_data2 = array([-2.61799388e+00, -2.09439510e+00, -1.57079633e+00, -1.04719755e+00,
       -8.48230016e-01, -8.42994029e-01, -8.37758041e-01, -8.32522053e-01,
       -8.27286065e-01, -8.22050078e-01, -8.16814090e-01, -8.11578102e-01,
       -8.06342114e-01, -8.01106127e-01, -7.95870139e-01, -7.90634151e-01,
       -5.23598776e-01, -2.61799388e-01, -4.44089210e-16,  2.61799388e-01,
        3.14159265e-01, -4.44089210e-16,  2.61799388e-01,  5.23598776e-01,
        8.11578102e-01,  8.16814090e-01,  8.22050078e-01,  8.27286065e-01,
        8.32522053e-01,  8.37758041e-01,  8.42994029e-01,  8.48230016e-01,
        8.53466004e-01,  8.58701992e-01,  1.04719755e+00,  1.30899694e+00,
        1.57079633e+00,  1.62315620e+00,  1.62839219e+00,  1.63362818e+00,
        1.63886417e+00,  1.64410016e+00,  1.64933614e+00,  1.65457213e+00,
        1.65980812e+00,  1.66504411e+00,  1.67028009e+00,  1.67551608e+00,
        1.68075207e+00,  1.68598806e+00,  1.69122405e+00,  1.69646003e+00,
        1.70169602e+00,  1.91637152e+00,  2.09439510e+00,  2.61799388e+00])

PS_data2 = array([3.25876564e+00, 3.81409794e+00, 1.11474756e+03, 8.12689697e+02,
       1.20040924e+04, 9.14052161e+03, 2.39588219e+04, 2.58246107e+05,
       2.25913494e+05, 2.00792586e+04, 7.48461712e+04, 7.79328551e+04,
       1.78038258e+04, 1.24493550e+04, 2.49796877e+03, 3.03134727e+03,
       3.66156269e+02, 5.77324315e+00, 0.00000000e+00, 3.44609756e+01,
       4.28032233e+01, 0.00000000e+00, 3.44609756e+01, 4.21229883e+02,
       1.10116850e+05, 5.88833899e+04, 4.86932316e+03, 9.00693183e+05,
       4.35660173e+04, 1.12480933e+05, 6.37520293e+05, 1.45519285e+05,
       1.84586394e+04, 1.07850037e+05, 8.74590279e+03, 5.45370145e+03,
       9.68703597e+01, 1.96921122e+05, 1.89929087e+06, 2.66404045e+05,
       3.42301329e+05, 1.20854982e+06, 9.16540956e+05, 3.22951992e+06,
       9.83937935e+05, 1.27623169e+05, 1.38875057e+05, 3.03198348e+05,
       1.66383588e+05, 3.25265340e+04, 2.45382866e+03, 5.44667942e+04,
       9.83374694e+04, 2.93687201e+02, 3.75552805e+00, 4.48805516e+00])

Solution

  • If I correctly understand your question the following function should do the job:

    def identification(x, y, ratio=10., prominence=1e5, distance=10):
        
        peaks, stats = signal.find_peaks(y, prominence=prominence, distance=distance)
        order = np.flip(np.argsort(stats["prominences"]))
        
        values = []
        
        if len(peaks) == 1:
            values.append(x[peaks[0]])
            
        elif len(peaks) > 1:
            for i, j in zip(order[:-1], order[1:]):
                if stats["prominences"][i] >= ratio * stats["prominences"][j]:
                    values.append(x[peaks[i]])
        
        return {
            "values": values,
            "order": order,
            "peaks": peaks,
            "stats": stats,
        }
    

    It scans for peaks, sort them by heights, then only keeps peaks which are ratio times taller than the next.

    Following calls follow your observations:

    identification(w_data1, PS_data1)
    # {'values': [-1.5812683],
    #  'order': array([0]),
    #  'peaks': array([15]),
    #  'stats': {'prominences': array([2087228.8524879]),
    #   'left_bases': array([0]),
    #   'right_bases': array([30])}}
    
    identification(w_data2, PS_data2)
    # {'values': [],
    #  'order': array([2, 1, 0]),
    #  'peaks': array([ 7, 27, 43]),
    #  'stats': {'prominences': array([ 258242.84823436,  900596.3126403 , 3229516.16447195]),
    #   'left_bases': array([ 0, 21, 21]),
    #   'right_bases': array([18, 36, 54])}}