Search code examples
pandastime-seriessklearn-pandasscikits

Finding local minimum between two peaks


I have some time series data in Pandas where I need to extract specific local minimums from a column so I can use them as Features in a LSTM model. To visualize what I'm looking for I've attached a Picture, where the circled points are the values that I wish to locate.

The other red dots that you see at the bottom of the graph is my failed attempt of using "argrelextrema" with the following code:

#Trying to Locate Minimum Values
df['HKL Min'] = df.iloc[argrelextrema(df.hkla.values, np.less_equal,order=50)[0]]['hkla']

#Plotting a range of values from dataset:
sns.lineplot(x=df.index[0:3000], y= 'hkla', data=df[0:3000], label='Hookload');
sns.scatterplot(x=df.index[0:3000], y= 'HKL Min', data=df[0:3000], s= 50, color ='red', label='HKL Min');

As you may notice, my column data has a repetitive pattern, and the points I wish to locate are the minimas found between two "peaks-pairs".Is there some existing functions in Python that can help me locate these specific points? Any form of help would be highly appreciated. I am also open to other suggestions that can solve my issue here...


Solution

  • You could do something like this with your data:

    import numpy as np
    import matplotlib.pyplot as plt
    import pandas as pd
    from scipy.signal import argrelextrema
    
    
    np.random.seed(1234)
    rs = np.random.randn(500)
    xs = [0]
    for r in rs:
        xs.append(xs[-1] * 0.999 + r)
    df = pd.DataFrame(xs, columns=['point'])
    

    which gives this data

    point
    0    0.000000
    1    0.471435
    2   -0.720012
    3    0.713415
    4    0.400050
    ..        ...
    496  3.176240
    497  3.007734
    498  3.123841
    499  1.045736
    500  0.041935
    
    [501 rows x 1 columns]
    
    

    You can choose how often you want to mark a local ma or min by playing with a parameter:

    n = 10
    
    df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
                        order=n)[0]]['point']
    df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
                        order=n)[0]]['point']
    
    
    plt.scatter(df.index, df['min'], c='r')
    plt.scatter(df.index, df['max'], c='r')
    plt.plot(df.index, df['point'])
    plt.show()
    

    Which gives:

    enter image description here

    Another choice for n might be (and it all depends on what you want):

    n = 40
    
    df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
                        order=n)[0]]['point']
    df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
                        order=n)[0]]['point']
    
    
    plt.scatter(df.index, df['min'], c='r')
    plt.scatter(df.index, df['max'], c='g')
    plt.plot(df.index, df['point'])
    plt.show()
    
    

    enter image description here

    To get a marking for which points actually where max and min, you can make a new df:

    new_df = pd.DataFrame(np.where(df.T == df.T.max(), 1, 0),index=df.columns).T
    

    which gives the information about which row in df is a maximum or a minimum. Otherwise, the original df contains that information in the created min and max columns, those instance that aren't nan

    EDIT: Finding peaks above threshold

    If you are intrested of peaks above a certain value, then you should use find_peaks in the following way:

    from scipy.signal import find_peaks 
    peaks, _ = find_peaks(df['point'], height = 15)
    plt.plot(df['point'])
    plt.plot(peaks, df['point'][peaks], "x")
    plt.show()
    

    which will produce:

    peaks,_
    
    
    (array([304, 309, 314, 317, 324, 329, 333, 337, 343, 349, 352, 363, 366,
            369, 372, 374, 377, 379, 381, 383, 385, 387, 391, 394, 397, 400,
            403, 410, 413, 418, 424, 427, 430, 433, 436, 439, 442, 444, 448],
           dtype=int64),
     {'peak_heights': array([15.68868141, 15.97184882, 15.04790966, 15.6146908 , 16.49191501,
             18.0852033 , 18.11467247, 19.48469432, 21.32391722, 19.90407526,
             19.93683051, 24.40980129, 28.00319793, 26.1080406 , 24.44322213,
             23.16993982, 22.27505873, 21.47500832, 22.3236231 , 24.02484906,
             23.83727054, 24.32609486, 21.25365717, 21.10295203, 20.03162979,
             20.64021444, 19.78510855, 21.62624829, 22.34904425, 21.60431638,
             18.41968769, 18.24153961, 18.00747871, 18.02793964, 16.72552016,
             17.58573207, 16.90982675, 16.9905686 , 16.30563852])})
    

    and graphically enter image description here