I have the following dataframe:
date = ['2015-02-03 23:00:00','2015-02-03 23:30:00','2016-02-04 00:00:00','2016-02-04 00:30:00']
v1 = [33.24 , 31.71 , 34.39 , 34.49 ]
v2 = [35.24 , 33.71 , 36.39 , 36.49 ]
df = pd.DataFrame({'V1':v1,'V2':v2}, index=pd.to_datetime(date))
print(df.head())
v1 V2
index
2015-02-03 23:00:00 33.24 35.24
2015-02-03 23:30:00 31.71 33.71
2016-02-04 00:00:00 34.39 36.39
2016-02-04 00:30:00 34.49 36.49
I would like to do the following operation: divide every row in column v1 by the sum of column v2 over the year. How can I do that efficiently?
So far I have tried to put up something like this, but I can get it to work:
df["result"] = df.groupby(df.index.year).apply(lambda x: x["V1"]/x['V2'].sum())
Use Index.year
and groupby.transform
:
df['result'] = df['V1'].div(df.groupby(df.index.year)['V2'].transform('sum'))
Output:
V1 V2 result
2015-02-03 23:00:00 33.24 35.24 0.482088
2015-02-03 23:30:00 31.71 33.71 0.459898
2016-02-04 00:00:00 34.39 36.39 0.471872
2016-02-04 00:30:00 34.49 36.49 0.473244