Search code examples
pythonpandasdatedatetimetimeline

How aggregate a pandas date timeline series only by hour


I have a pandas timeline table containing dates objects and scores:

          datetime   score
2018-11-23 08:33:02      4
2018-11-24 09:43:30      2
2018-11-25 08:21:34      5
2018-11-26 19:33:01      4
2018-11-23 08:50:40      1
2018-11-23 09:03:10      3

I want to aggregate the score by hour without taking into consideration the date, the result desired is :

08:00:00        10
09:00:00        5
19:00:00        4

So basically I have to remove the date-month-year, and then group score by hour,

I tried this command

monthagg = df['score'].resample('H').sum().to_frame()

Which does work but takes into consideration the date-month-year, How to remove DD-MM-YYYY and aggregate by Hour?


Solution

  • One possible solution is use DatetimeIndex.floor for set minutes and seconds to 0 and then convert DatetimeIndex to strings by DatetimeIndex.strftime, then aggregate sum:

    a = df['score'].groupby(df.index.floor('H').strftime('%H:%M:%S')).sum()
    #if column datetime
    #a = df['score'].groupby(df['datetime'].dt.floor('H').dt.strftime('%H:%M:%S')).sum()
    print (a)
    08:00:00    10
    09:00:00     5
    19:00:00     4
    Name: score, dtype: int64
    

    Or use DatetimeIndex.hour and aggregate sum:

    a = df.groupby(df.index.hour)['score'].sum()
    #if column datetime
    #a = df.groupby(df['datetime'].dt.hour)['score'].sum()
    print (a)
    datetime
    8     10
    9      5
    19     4
    Name: score, dtype: int64