Search code examples
pandasgroup-by

groupby year calculation on two different columns


I have the following dataframe:

date  = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2016-02-04 00:00:00','2016-02-04 00:30:00']
v1 = [33.24  , 31.71  , 34.39  , 34.49 ]
v2 = [35.24  , 33.71  , 36.39  , 36.49 ]
    
df = pd.DataFrame({'V1':v1,'V2':v2}, index=pd.to_datetime(date))

print(df.head())
    
                     v1       V2
index                                     
2015-02-03 23:00:00  33.24  35.24 
2015-02-03 23:30:00  31.71   33.71
2016-02-04 00:00:00  34.39   36.39
2016-02-04 00:30:00  34.49   36.49

I would like to do the following operation: divide every row in column v1 by the sum of column v2 over the year. How can I do that efficiently?

So far I have tried to put up something like this, but I can get it to work:

df["result"] = df.groupby(df.index.year).apply(lambda x: x["V1"]/x['V2'].sum())

Solution

  • Use Index.year and groupby.transform:

    df['result'] = df['V1'].div(df.groupby(df.index.year)['V2'].transform('sum'))
    

    Output:

                            V1     V2    result
    2015-02-03 23:00:00  33.24  35.24  0.482088
    2015-02-03 23:30:00  31.71  33.71  0.459898
    2016-02-04 00:00:00  34.39  36.39  0.471872
    2016-02-04 00:30:00  34.49  36.49  0.473244