I have a DataFrame
given with a 10 minute interval which looks likes this:
intervall_start group count
0 2014-03-05 14:30:00 A 2
1 2014-03-05 14:40:00 A 2
2 2014-03-05 14:50:00 B 2
3 2014-03-05 15:00:00 B 1
4 2014-03-05 15:10:00 A 4
5 2014-03-05 15:20:00 B 1
My goal is now to extrapolate this 10 minute interval into a 15 minute, 30 minute or 1 hour interval. I want to set the interval in which the 10 minute interval should be extrapolated. So it should be dynamically!
Let's say I want to extrapolate the 10 minute interval into a 15 minute interval then I want to have this result.
intervall_start group count
0 2014-03-05 14:30:00 A (10/15 * 2 + 5/15 * 2) = 2
1 2014-03-05 14:45:00 A (5/15 * 2 + 10/15 * 0) = 0.666
2 2014-03-05 15:00:00 A (10/15 * 4 + 5/15 * 0) = 2.666
3 2014-03-05 15:15:00 A (5/15 * 4 + 10/15 * 1) = 2
intervall_start group count
0 2014-03-05 14:45:00 B (5/15 * 0 + 10/15 * 2) = 1.333
1 2014-03-05 15:00:00 B (10/15 * 1 + 5/15 * 0) = 0.666
2 2014-03-05 15:15:00 B (5/15 * 0 + 10/15 * 1) = 0.666
Do you guys have any idea how to do that? But as I said, dynamically! I also want to be able to shift it into a 5 minute or 1 hour interval!
Why I use 10/15
and 5/15
as weights? Because this is the weight for the 15 minute interval. For exmaple if I have 3 counts from 15:00 to 15:10 then this is a 10/15 * 3
weight!
Explanation of count:
I count in my database if the gate for the group was open or closed. 1 is opened and 0 is closed. I just select the gates which are opened (1) and count them. So, I know how many gates were opened for this group!
Now I am not allowed to sum those counts up! Because otherwise I would have more gates opened then I actually have! I have to calculate the average of gates which were opened for the timeframe!
Kind regards and thank you!
I believe what you're looking for is resample
:
# Create a date-time index
df.intervall_start = pd.to_datetime(df.intervall_start)
df.set_index("intervall_start", inplace = True)
# make sure we have a sample every 10 minutes (fill missing samples with 0)
df = df.groupby("group")[["count"]].resample("10min").asfreq()
df = df.fillna(0)
# sample every minute, to be later used when calculatiung the mean per period
df = df.reset_index(level = "group").groupby("group")[["count"]].resample("1min").pad()
# resample for every 15 minutes using 'mean'
df = df.reset_index(level = "group").groupby("group")[["count"]].resample("15min").mean()
The output is:
count
group intervall_start
A 2014-03-05 14:30:00 2.000000
2014-03-05 14:45:00 0.666667
2014-03-05 15:00:00 0.363636
B 2014-03-05 14:45:00 2.000000
2014-03-05 15:00:00 0.666667
2014-03-05 15:15:00 0.166667