Search code examples
pythonmachine-learningtime-seriesseabornsmoothing

How to smooth timeseries with yearly data with lowess in python


I have some data that were recoreded yearly as follows.

mydata = [0.6619346141815186, 0.7170140147209167, 0.692265510559082, 0.6394098401069641, 0.6030995845794678, 0.6500746607780457, 0.6013327240943909, 0.6273292303085327, 0.5865356922149658, 0.6477396488189697, 0.5827181339263916, 0.6496025323867798, 0.6589270234107971, 0.5498126149177551, 0.48638370633125305, 0.5367399454116821, 0.517595648765564, 0.5171639919281006, 0.47503289580345154, 0.6081966757774353, 0.5808742046356201, 0.5856912136077881, 0.5608134269714355, 0.6400936841964722, 0.6766082644462585]

corresponding_year = [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994]]

I used statsmodels python package to calculate lowess as follows.

import statsmodels.api as sm
lowess = sm.nonparametric.lowess

z = lowess(x, y, frac= 1./3, it=3)

The output I got was as follows.

      [[1.96000000e+03, 6.95703548e-01],
       [1.96100000e+03, 6.81750671e-01],
       [1.96200000e+03, 6.68002318e-01],
       [1.96300000e+03, 6.55138324e-01],
       [1.96400000e+03, 6.38960761e-01],
       [1.96500000e+03, 6.25042177e-01],
       [1.96600000e+03, 6.18586936e-01],
       [1.96700000e+03, 6.17026334e-01],
       [1.96800000e+03, 6.14565102e-01],
       [1.96900000e+03, 6.17610340e-01],
       [1.97000000e+03, 6.20404414e-01],
       [1.97100000e+03, 6.10193222e-01],
       [1.97200000e+03, 5.90100648e-01],
       [1.97300000e+03, 5.70935248e-01],
       [1.97400000e+03, 5.47818726e-01],
       [1.97500000e+03, 5.25788570e-01],
       [1.97600000e+03, 5.18661218e-01],
       [1.97700000e+03, 5.28921300e-01],
       [1.97800000e+03, 5.42783400e-01],
       [1.97900000e+03, 5.55425915e-01],
       [1.98000000e+03, 5.71486587e-01],
       [1.98100000e+03, 5.91539778e-01],
       [1.98200000e+03, 6.13021691e-01],
       [1.98300000e+03, 6.34508409e-01],
       [1.98400000e+03, 6.57703989e-01]]

However, I am not clear what are the two values I get in statsmodel. Is there something I make wrong. Moreover, I would also like to know what the two paramers frac and it do?

Moreover, I would also like to plot the smoothed timeseries using seaborn. It seems like seaborn supports lowess. However, it does not have the frac and it parameters. See the code below.

import numpy as np
import seaborn as sns

x = np.arange(0, 10, 0.01)
ytrue = np.exp(-x / 5) + 2 * np.sin(x / 3)
y = ytrue + np.random.normal(size=len(x))

sns.regplot(x, y, lowess=True)

In that case, is it possible to draw regplot in seaborn using statmodels output?

I am happy to provide more details if needed.


Solution

  • The lowess result can be plotted as shown in the code below. Note that lowess() first argument is the y-value (endog) and the second is the x (exog). The default result has z[:,0] being the sorted x-values and z[:,1] the corresponding estimated y-values.

    import matplotlib.pyplot as plt
    import statsmodels.api as sm
    import numpy as np
    
    mydata = [0.6619346141815186, 0.7170140147209167, 0.692265510559082, 0.6394098401069641, 0.6030995845794678, 0.6500746607780457, 0.6013327240943909, 0.6273292303085327, 0.5865356922149658, 0.6477396488189697, 0.5827181339263916, 0.6496025323867798, 0.6589270234107971, 0.5498126149177551, 0.48638370633125305, 0.5367399454116821, 0.517595648765564, 0.5171639919281006, 0.47503289580345154, 0.6081966757774353, 0.5808742046356201, 0.5856912136077881, 0.5608134269714355, 0.6400936841964722, 0.6766082644462585]
    corresponding_year = [1970,1971,1972,1973,1974,1975,1976,1977,1978,1979,1980,1981,1982,1983,1984,1985,1986,1987,1988,1989,1990,1991,1992,1993,1994]
    
    x = np.array(corresponding_year)
    y = np.array(mydata)
    z = sm.nonparametric.lowess(y, x, frac= 1./3, it=3)
    
    plt.plot(x, y, color='dodgerblue')
    plt.plot(z[:,0], z[:,1], 'ro-')
    
    plt.show()
    

    resulting plot

    PS: To compare to the seaborn regplot on the same plot, call it as:

    sns.regplot(x, y, lowess=True, ax=plt.gca())