Search code examples
pythonpandastime-series

Group by hours and minutes without multiple indexes


I have a data frame with two columns. 'Login_time' is punch in times in datetime. The times are separated by 15 minutes and the range is over 30 years. The second column is 'counted' which is the amount of times people punched in during the interval.

login_time counted
1970-01-01 20:00:00 2
1970-01-01 20:15:00 6

I want to count how many people clocked in at all time intervals during the year.

interval total punches
20:00:00 3008
20:15:00 42000

I tried the following:

df_grouped = df2.groupby([df2['login_time'].dt.hour, df2['login_time'].dt.minute]).counted.sum()

It kind of works but I get a multi index with the hour being one index and the minute intervals.

is it possible to get it like I have it above? or at least to unit the indices into 1?


Solution

  • Instead of grouping by dt.hour and dt.minute, simply group by dt.time:

    import pandas as pd
    
    data = {'login_time': {0: pd.Timestamp('1970-01-01 20:00:00'),
                           1: pd.Timestamp('1970-01-01 20:15:00'),
                           2: pd.Timestamp('1970-01-02 20:00:00'),
                           3: pd.Timestamp('1970-01-02 20:15:00')},
            'counted': {0: 2, 1: 6, 2: 2, 3: 6}}
    
    df = pd.DataFrame(data)
    
    df_grouped = (df.groupby(df['login_time'].dt.time)['counted']
                  .sum()
                  .reset_index()
                  .rename(columns={'login_time': 'interval', 'counted': 'total punches'}))
    
    df_grouped
    
       interval  total punches
    0  20:00:00              4
    1  20:15:00             12