Search code examples
pythonnumpyscipytrend

Detrending data with nan value in scipy.signal


I have a time series dataset with some nan values in it. I want to detrend this data:

I tried by doing this:

scipy.signal.detrend(y)

then I got this error:

ValueError: array must not contain infs or NaNs

Then I tried with:

scipy.signal.detrend(y.dropna())

But I lost data order.

How to solve this porblem?


Solution

  • For future reference there is a digital signal processing Stack site, https://dsp.stackexchange.com/. I would suggest using that in the future for signal processing related questions.


    The easiest way I can think of is to manually detrend your data. You can do this easily by computing least squares. Least squares will take into account both your x and y values, so you can drop out the x values corresponding to where y = NaN.

    You can grab the indices of the non-NaN values with not_nan_ind = ~np.isnan(y), and then do linear regression with the non-NaN values of y and the corresponding x values with, say, scipy.stats.linregress():

    m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])
    

    Then you can simply subtract off this line from your data y to obtain the detrended data:

    detrend_y = y - (m*x + b)
    

    And that's all you need. For example with some dummy data:

    import numpy as np
    from matplotlib import pyplot as plt
    from scipy import stats
    
    # create data
    x = np.linspace(0, 2*np.pi, 500)
    y = np.random.normal(0.3*x, np.random.rand(len(x)))
    drops = np.random.rand(len(x))
    y[drops>.95] = np.NaN # add some random NaNs into y
    plt.plot(x, y)
    

    Data with some NaN values

    # find linear regression line, subtract off data to detrend
    not_nan_ind = ~np.isnan(y)
    m, b, r_val, p_val, std_err = stats.linregress(x[not_nan_ind],y[not_nan_ind])
    detrend_y = y - (m*x + b)
    plt.plot(x, detrend_y)
    

    Detrended data