Search code examples
pythonpandasdataframedatetimeintervals

How do I label a Pandas datetime row (in a new column) for every 15 minute interval?


I have a pandas dataframe that looks something like this:

Date
31/8/20 4:56:00
31/8/20 4:45:00
31/8/20 4:39:00
31/8/20 4:36:00
31/8/20 4:34:00
31/8/20 4:05:00
31/8/20 4:04:00
31/8/20 3:19:00
31/8/20 2:54:00

I would like it to group rows on a 15-minute basis and label each row if it falls into that 15-minute datetime category. So I'd like my output to look like this:

Date              Interval
31/8/20 4:56:00   Period 1
31/8/20 4:45:00   Period 1
31/8/20 4:39:00   Period 2
31/8/20 4:36:00   Period 2
31/8/20 4:34:00   Period 2
31/8/20 4:05:00   Period 4
31/8/20 4:04:00   Period 4
31/8/20 3:19:00   Period 7
31/8/20 2:54:00   Period 9

As you can see 'Period 1' contains all datetimes within the interval of 31/8/20 4:45 - 31/8/20 4:59:00. Period gaps should occur if there is no corresponding row entry for that period. This is why 'Period 4' is not 'Period 3'

I have tried the following:

df = pd.DataFrame()
df = df.sort_values(by=['Date'], ascending=False)
df['Date'] = pd.to_datetime(df['Date'])
df['Interval'] = 'Period ' + (((df.shift()['Date'] - df['Date']).dt.seconds > 900).cumsum() + 1).astype(str)

However, this returns 'Period n' within the 'Interval' column if the datetime row is within 15-minutes of the previous datetime row.

Date              Interval
31/8/20 4:56:00   Period 1
31/8/20 4:45:00   Period 2
31/8/20 4:39:00   Period 3
31/8/20 4:36:00   Period 4
31/8/20 4:34:00   Period 5
31/8/20 4:05:00   Period 6
31/8/20 4:04:00   Period 7
31/8/20 3:19:00   Period 8
31/8/20 2:54:00   Period 9

Rather I would like to label each row with a 'Period' regardless of the previous rows datetime and based on four 15-minute intervals within the hour such that Periods are sequentially labelled based on

  • dd/mm/yy h:45:00 - h:59:59,
  • dd/mm/yy h:30:00 - h:44:59,
  • dd/mm/yy h:15:00 - h:29:59,
  • dd/mm/yy h:00:00 - h:14:59

Thank you in advance.


Solution

  • Use DataFrame.groupby on Grouper with frequency of 15min, then use groupby.ngroup with optional parameter ascending=False to number each group:

    s = df.groupby(pd.Grouper(freq='15min', key='Date')).ngroup(ascending=False).add(1)
    df['Interval'] = 'Period ' + s.astype(str)
    

    Result:

                     Date  Interval
    0 2020-08-31 04:56:00  Period 1
    1 2020-08-31 04:45:00  Period 1
    2 2020-08-31 04:39:00  Period 2
    3 2020-08-31 04:36:00  Period 2
    4 2020-08-31 04:34:00  Period 2
    5 2020-08-31 04:05:00  Period 4
    6 2020-08-31 04:04:00  Period 4
    7 2020-08-31 03:19:00  Period 7
    8 2020-08-31 02:54:00  Period 9