Search code examples
pythonrstatisticsmaxsmoothing

Smooth Data and Find Maximum


I have a dataset (see below) of two variables, x and y. I want to find for which value of x, does a maximum in y occur. My current approach is simply to look up the x which gives me the maximum y. This is not ideal as my data is quite noisy, so I would like to perform some sort of smoothing first, and then find the max.

So far, I have tried to use R to smooth my data with npreg (kernel regression) from the np package to obtain this curve:

enter image description here

but I'm not sure how to find the max.

I would like a solution to the following in Python:

1) Smooth the data (doesn't to be kernel regression)

2) Find the value of x where the max in y occurs using the smoothed data

x   y
-20 0.006561733
-19 -4.48E-08
-18 -4.48E-08
-17 -4.48E-08
-16 0.003281305
-15 0.00164063
-14 0.003280565
-13 0.003282537
-12 -4.48E-08
-11 0.003281286
-10 0.004921239
-9  0.00491897
-8  -1.52E-06
-7  0.004925867
-6  -1.27E-06
-5  0.009839438
-4  0.001643726
-3  -4.48E-08
-2  2.09E-06
-1  -0.001640027
0   0.006559627
1   0.001636958
2   2.36E-06
3   0.003281469
4   0.011481469
5   0.004922279
6   0.018044207
7   0.011483134
8   0.014765087
9   0.008201379
10  0.00492497
11  0.006560482
12  0.009844796
13  0.011483199
14  0.008202129
15  0.001641621
16  0.004921645
17  0.006563377
18  0.006561068
19  0.008201004

Solution

  • I'd run a Gaussian filter over the data to smooth:

    # first, make a function to linearly interpolate the data
    f = scipy.interpolate.interp1d(x,y)
    
    # resample with 1000 samples
    xx = np.linspace(-20,19, 1000)
    
    # compute the function on this finer interval
    yy = f(xx)
    
    # make a gaussian window
    window = scipy.signal.gaussian(200, 60)
    
    # convolve the arrays
    smoothed = scipy.signal.convolve(yy, window/window.sum(), mode='same')
    
    # get the maximum
    xx[np.argmax(smoothed)]
    

    Here's the smoothed result:

    The smoothed result.

    The max occurs at 6.93.

    There are a whole bunch of other window functions and filtering options in scipy.signal. See the documentation for more.