I have a dataset (see below) of two variables, x and y. I want to find for which value of x, does a maximum in y occur. My current approach is simply to look up the x which gives me the maximum y. This is not ideal as my data is quite noisy, so I would like to perform some sort of smoothing first, and then find the max.
So far, I have tried to use R to smooth my data with npreg
(kernel regression) from the np
package to obtain this curve:
but I'm not sure how to find the max.
I would like a solution to the following in Python:
1) Smooth the data (doesn't to be kernel regression)
2) Find the value of x where the max in y occurs using the smoothed data
x y
-20 0.006561733
-19 -4.48E-08
-18 -4.48E-08
-17 -4.48E-08
-16 0.003281305
-15 0.00164063
-14 0.003280565
-13 0.003282537
-12 -4.48E-08
-11 0.003281286
-10 0.004921239
-9 0.00491897
-8 -1.52E-06
-7 0.004925867
-6 -1.27E-06
-5 0.009839438
-4 0.001643726
-3 -4.48E-08
-2 2.09E-06
-1 -0.001640027
0 0.006559627
1 0.001636958
2 2.36E-06
3 0.003281469
4 0.011481469
5 0.004922279
6 0.018044207
7 0.011483134
8 0.014765087
9 0.008201379
10 0.00492497
11 0.006560482
12 0.009844796
13 0.011483199
14 0.008202129
15 0.001641621
16 0.004921645
17 0.006563377
18 0.006561068
19 0.008201004
I'd run a Gaussian filter over the data to smooth:
# first, make a function to linearly interpolate the data
f = scipy.interpolate.interp1d(x,y)
# resample with 1000 samples
xx = np.linspace(-20,19, 1000)
# compute the function on this finer interval
yy = f(xx)
# make a gaussian window
window = scipy.signal.gaussian(200, 60)
# convolve the arrays
smoothed = scipy.signal.convolve(yy, window/window.sum(), mode='same')
# get the maximum
xx[np.argmax(smoothed)]
Here's the smoothed result:
The max occurs at 6.93.
There are a whole bunch of other window functions and filtering options in scipy.signal
. See the documentation for more.