Search code examples
pythonpandasstatsmodelssmoothing

Spline smoothening using statsmodel within Python pandas dataframe


I need to do group by smoothening of sales percentage values which could be erratic due to out of stock situations. I have my data in a Pandas dataframe. Here is the code I am trying:

from scipy.interpolate import UnivariateSpline
s = base_data1.groupby(['MDSE_ITEM_I','CO_LOC_I'])\
.transform(lambda x: UnivariateSpline(np.arange(x.count()), x['PCT_TILL_DATE'].value, s=x.count()))

Here I am passing np.arange(x.count()) as x monotonically increasing array and passing values of Pandas series x['PCT_TILL_DATE'].value as y with good enough smoothing factor s as x.count(). However I am getting error:

KeyError: ('PCT_TILL_DATE', u'occurred at index GREG_D')

What I am missing here?


Solution

  • You dont need to select the column you want, because transform() already turns it into a Series, which you cant index like that.

    Also, UnitvariateSpline returns a 'fitted' object which you need to call again with your desired x-output to get some actual values.

    import pandas as pd
    from scipy.interpolate import UnivariateSpline
    
    n = 16
    df = pd.DataFrame({'data1': np.cos(np.linspace(0,np.pi*4, n)),
                       'data2': np.random.randn(n),
                       'class_a': np.array([0]*(n//2)+[1]*(n//2)),
                       'class_b': np.array([1]*n)})
    
    def grpfunc(grp):
    
        n = len(grp)
        x = np.arange(n)
    
        spl = UnivariateSpline(x, grp.values, s=n)
    
        return spl(x)
    
    df.groupby(['class_a', 'class_b']).transform(grpfunc)
    

    enter image description here