I have a time series dataset with some nan values in it. I want to detrend this data:
I tried by doing this:
scipy.signal.detrend(y)
then I got this error:
ValueError: array must not contain infs or NaNs
Then I tried with:
scipy.signal.detrend(y.dropna())
But I lost data order.
How to solve this porblem?
For future reference there is a digital signal processing Stack site, https://dsp.stackexchange.com/. I would suggest using that in the future for signal processing related questions.
The easiest way I can think of is to manually detrend your data. You can do this easily by computing least squares. Least squares will take into account both your x
and y
values, so you can drop out the x
values corresponding to where y = NaN
.
You can grab the indices of the non-NaN
values with not_nan_ind = ~np.isnan(y)
, and then do linear regression with the non-NaN
values of y
and the corresponding x
values with, say, scipy.stats.linregress()
:
m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])
Then you can simply subtract off this line from your data y
to obtain the detrended data:
detrend_y = y - (m*x + b)
And that's all you need. For example with some dummy data:
import numpy as np
from matplotlib import pyplot as plt
from scipy import stats
# create data
x = np.linspace(0, 2*np.pi, 500)
y = np.random.normal(0.3*x, np.random.rand(len(x)))
drops = np.random.rand(len(x))
y[drops>.95] = np.NaN # add some random NaNs into y
plt.plot(x, y)
# find linear regression line, subtract off data to detrend
not_nan_ind = ~np.isnan(y)
m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])
detrend_y = y - (m*x + b)
plt.plot(x, detrend_y)