Search code examples
pythonpandasdataframegroup-bymoving-average

Customized Moving Average on Pandas Dataframe With GroupBy


There is a dataset with columns ID and Feature_1. Feature_1 could be understood as a specific duration of session in secs. There is also a custom function, which calculates moving average with addition of simple average in the begining according to number of NaN's caused by window width. Here it is:

def moving_average_mit_anfang(x, w):
        # First part - simple average
        first_part_result = np.cumsum(x)/np.cumsum(np.ones(len(x)))
        # If appearence of user's sessions is greater than window width, we calculate moving average
        if len(x)>w:
            # Second part - moving average with window w
            sec_part_result = np.convolve(x, np.ones(w), 'valid') / w
            return np.append(first_part_result[:-len(sec_part_result)],sec_part_result)
        # Otherwise we calculate only simple average
        else:
            return first_part_result

We should apply this function on column Featrue_1 in such way, that we get current average value for each ID, according to appearence time of corresponding ID.

Example dataframe:

pd.DataFrame(data={'ID':[1,2,3,2,3,1,2,1,3,3,3,2,1],
                   'Feature_1':[4,5,6,73,2,21,13,45,32,9,18,45,39]})

I tried this:

test_df.groupby('ID')['Feature_1'].transform(lambda x: moving_average_mit_anfang(x,1))

And got this:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-38-6cc3e6c9b134> in <module>
----> 1 test_df.groupby('ID')['Feature_1'].transform(lambda x: moving_average_mit_anfang(x,1))

~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/groupby/generic.py in transform(self, func, engine, engine_kwargs, *args, **kwargs)
    505 
    506         if not isinstance(func, str):
--> 507             return self._transform_general(func, *args, **kwargs)
    508 
    509         elif func not in base.transform_kernel_allowlist:

~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/groupby/generic.py in _transform_general(self, func, *args, **kwargs)
    535                 res = res._values
    536 
--> 537             results.append(klass(res, index=group.index))
    538 
    539         # check for empty "results" to avoid concat ValueError

~/DS/RS/rs_env/lib/python3.8/site-packages/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    346                 try:
    347                     if len(index) != len(data):
--> 348                         raise ValueError(
    349                             f"Length of passed values is {len(data)}, "
    350                             f"index implies {len(index)}."

ValueError: Length of passed values is 6, index implies 4.

Output should be like:

    ID  Feature_1  Custom average
0    1          4             4.0
1    2          5             5.0
2    3          6             6.0
3    2         73            39.0
4    3          2             4.0
5    1         21            12.5
6    2         13            43.0
7    1         45            33.0
8    3         32             4.0
9    3          9            20.5
10   3         18            13.5
11   2         45            29.0
12   1         39            42.0

Solution

  • Your new solution working, also is possible omit lambda function for simplier solution (with lambda working too):

    test_df['Custom average'] = test_df.groupby('ID')['Feature_1'].transform(moving_average_mit_anfang,2)
    print (test_df)
        ID  Feature_1  Custom average
    0    1          4             4.0
    1    2          5             5.0
    2    3          6             6.0
    3    2         73            39.0
    4    3          2             4.0
    5    1         21            12.5
    6    2         13            43.0
    7    1         45            33.0
    8    3         32            17.0
    9    3          9            20.5
    10   3         18            13.5
    11   2         45            29.0
    12   1         39            42.0