Search code examples
pythondataframelist-comprehensionpandas-groupbycustom-function

Problems understanding the logic when creating code using groupby, list comprehensions and custom functions


I want to calculate a rolling mean of different window sizes for each ticker in my dataframe. Ideally I could pass a list of window sizes and for each ticker I would get new columns (one for each rolling mean size). So if I wanted a rolling mean of 2 and one of 3, the output would be two columns for each ticker.

import datetime as dt
import numpy as np
import pandas as pd
Dt_df = pd.DataFrame({"Date":pd.date_range('2018-07-01', periods=5, freq='D')})
Tick_df = pd.DataFrame({"Ticker":['ABC',"HIJ","XYZ"]})
Mult_df = pd.merge(Tick_df.assign(key='x'), Dt_df.assign(key='x') on='key').drop('key', 1)
df2 = pd.DataFrame(np.random.randint(low=5, high=10, size=(15, 1)),  columns=['Price'])
df3 = Mult_df.join(df2, how='outer')
df3.set_index(['Ticker','Date'],inplace = True)

Here is the Example Dataset:

enter image description here

When I try to apply this function:

def my_RollMeans(x):
    w = [1,2,3]
    s = pd.Series(x)
    Bob = pd.DataFrame([s.rolling(w1).mean() for w1 in w]).T
    return Bob

to my dataframe df3 using various versions of apply or transform I get errors.

NewDF = df3.groupby('Ticker').Price.transform(my_RollMeans).fillna(0)

The latest error is:

Data must be 1-dimensional


Solution

  • IIUC try using apply and I made a modification to your custom function:

    def my_RollMeans(x):
        w = [1,2,3]
        s = pd.Series(x)
        Bob = pd.DataFrame([s.rolling(w1).mean().rename('Price_'+str(w1)) for w1 in w]).T
        return Bob
    
    df3.groupby('Ticker').apply(lambda x : my_RollMeans(x.Price)).fillna(0)
    

    Output:

                       Price_1  Price_2   Price_3
    Ticker Date                                  
    ABC    2018-07-01      9.0      0.0  0.000000
           2018-07-02      8.0      8.5  0.000000
           2018-07-03      7.0      7.5  8.000000
           2018-07-04      8.0      7.5  7.666667
           2018-07-05      8.0      8.0  7.666667
    HIJ    2018-07-01      8.0      0.0  0.000000
           2018-07-02      9.0      8.5  0.000000
           2018-07-03      5.0      7.0  7.333333
           2018-07-04      6.0      5.5  6.666667
           2018-07-05      7.0      6.5  6.000000
    XYZ    2018-07-01      9.0      0.0  0.000000
           2018-07-02      5.0      7.0  0.000000
           2018-07-03      9.0      7.0  7.666667
           2018-07-04      8.0      8.5  7.333333
           2018-07-05      6.0      7.0  7.666667