Search code examples
pythonpandasdataframegroupinganalytics

Python Group by minutes in a day


I have log data that spans over 30 days. I'm am looking to group the data to see what 15 minute window has the lowest amount of events in total over 24hours. The data is formated as so:

2021-04-26 19:12:03, upload
2021-04-26 11:32:03, download
2021-04-24 19:14:03, download
2021-04-22 1:9:03, download
2021-04-19 4:12:03, upload
2021-04-07 7:12:03, download

and I'm looking for a result like

19:15:00, 2
11:55:00, 1
7:15:00, 1
4:15:00, 1
1:15:00, 1

currently, I used grouper:

df['date'] = pd.to_datetime(df['date'])
df.groupby(pd.Grouper(key="date",freq='.25H')).Host.count()

and my results are looking like\

date
2021-04-08 16:15:00+00:00     1
2021-04-08 16:30:00+00:00    20
2021-04-08 16:45:00+00:00     6
2021-04-08 17:00:00+00:00     6
2021-04-08 17:15:00+00:00     0
                             ..
2021-04-29 18:00:00+00:00     3
2021-04-29 18:15:00+00:00     9
2021-04-29 18:30:00+00:00     0
2021-04-29 18:45:00+00:00     3
2021-04-29 19:00:00+00:00    15

Is there any way so I can not merge again on just the time and not include the date?


Solution

  • Do you want something like this?

    Here, the idea is - If you're not concern about the date, then you can replace all the dates with some random date, and then you can group/count the data based on time data only.

    df.Host = 1 
    df.date = df.date.str.replace( r'(\d{4}-\d{1,2}-\d{1,2})','2021-04-26', regex=True)
    df.date = pd.to_datetime(df.date)
    new_df = df.groupby(pd.Grouper(key='date',freq='.25H')).agg({'Host' : sum}).reset_index()
    new_df = new_df.loc[new_df['Host']!=0]
    new_df['date'] = new_df['date'].dt.time