I have tabular data (described by col1, col2,...) associated to a Timestamp and a custom made Group. Example:
Group Col1 Col2 Col3
Time
2020-05-18 A 10 20 30
2020-05-18 B 10 20 30
2020-05-18 C 10 20 30
2020-05-19 A 10 20 30
2020-05-19 B 10 20 30
2020-05-19 C 10 20 30
2020-05-20 A 10 20 30
2020-05-20 B 10 20 30
2020-05-20 C 10 20 30
2020-05-21 A 10 20 30
2020-05-21 B 10 20 30
2020-05-21 C 10 20 30
This dataframe can be generated by:
dataframe = pd.DataFrame({"Time": ["2020-05-18", "2020-05-18", "2020-05-18", "2020-05-19", "2020-05-19", "2020-05-19", "2020-05-20", "2020-05-20", "2020-05-20", "2020-05-21", "2020-05-21", "2020-05-21"], "Group": ["A","B","C","A","B","C","A","B","C","A","B","C"], "Col1": 12*[10], "Col2": 12*[20], "Col3": 12*[30]})
dataframe["Time"] = pd.to_datetime(dataframe["Time"])
dataframe = dataframe.set_index("Time")
I would like to create time bins (like with resample function), average values WITHIN the same group, and THEN sum them up within the same time bin.
But if I write dataframe.resample("2D").sum()
, values will be direclty summed up over all the groups without first being averaged in the same group. Like this:
Col1 Col2 Col3
Time
2020-05-18 60 120 180
2020-05-20 60 120 180
But what I need is to average values belonging to the same group BEFORE summing them up. In this case the desired output would be:
Col1 Col2 Col3
Time
2020-05-18 30 60 90
2020-05-20 30 60 90
I tried to apply groupby after resampling (e.g. dataframe.resample("2D").grouby("Group")
), but it raises a TypeError ("'TimeGrouper' object is not callable")
What could be a way to solve this problem? Thanks in advance.
EDIT:
Output of dataframe.groupby(["Time","Group"]).mean()
:
Col1 Col2 Col3
Time Group
2020-05-18 A 10 20 30
B 10 20 30
C 10 20 30
2020-05-19 A 10 20 30
B 10 20 30
C 10 20 30
2020-05-20 A 10 20 30
B 10 20 30
C 10 20 30
2020-05-21 A 10 20 30
B 10 20 30
C 10 20 30
Use Grouper
with aggregate mean
:
print (dataframe.groupby(['Group',pd.Grouper(freq='2D', key='Time')]).mean())
Col1 Col2 Col3
Group Time
A 2020-05-18 10 20 30
2020-05-20 10 20 30
B 2020-05-18 10 20 30
2020-05-20 10 20 30
C 2020-05-18 10 20 30
2020-05-20 10 20 30
And then sum
by second level Time
:
df = (
dataframe
.groupby(['Group', pd.Grouper(freq='2D', key='Time')])
.mean()
.sum(level=1)
)
print (df)
Col1 Col2 Col3
Time
2020-05-18 30 60 90
2020-05-20 30 60 90