Search code examples
pythonpandastime-series

How to find overlapping time start and end points?


I would like to find for each ID, earliest measurement time before 12:00:00 and latest measurement time after 12:00:00. So that I can choose maximum overlapping start and ending time. Here is the sample data:

import numpy as np
import pandas as pd
import random

df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
                   'ID': [random.randrange(1, 20) for n in range(430)]})

df['VALUE1'] = [random.randrange(110, 140) for n in range(430)]
df['VALUE2'] = [random.randrange(50, 60) for n in range(430)]
df['VALUE3'] = [random.randrange(80, 100) for n in range(430)]
df['VALUE4'] = [random.randrange(30, 50) for n in range(430)]

df['MODEL'] = [random.randrange(1, 3) for n in range(430)]

df['SOLD'] = [random.randrange(0, 2) for n in range(430)]

df['INSPECTION'] = df['DATE_TIME'].dt.day

df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')

df['TIME'] = df['DATE_TIME'].dt.time
# df['TIME'] = pd.to_timedelta(df['TIME'])
df['TIME'] = df['TIME'].astype('str')


# Create DAY Night columns only-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight='00:00:00', start_of_morning='06:00:00',
                     start_of_afternoon='13:00:00',
                     start_of_evening='18:00:00', end_of_evening='23:00:00', start_of_night='24:00:00'):
    bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
    labels = ['Night', 'Morning', 'Morning', 'Night', 'Night']

    return pd.cut(
        pd.to_timedelta(dataframe),
        bins=list(map(pd.Timedelta, bins)),
        labels=labels, right=False, ordered=False
    )


df['CYCLE_PART'] = cycle_day_period(df['TIME'], '00:00:00', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00')

My expectation is to find T_start and T_end like (for a same day 24h measurement) in the picture. Please refer to the drawing since my wording of the problem might be confusing:

enter image description here


Solution

  • What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first groupby.agg to get the min/max per group. Then aggregate again this time getting the max of the minima and min of the maxima:

    df.groupby('ID')['TIME'].agg(['min', 'max']).agg({'min': 'max', 'max': 'min'})
    

    If you really need to filter the value before after 12:00:00:

    (df.groupby('ID')['TIME']
       .agg(min=lambda x: x[x.lt('12:00:00')].min(),
            max=lambda x: x[x.gt('12:00:00')].max())
       .agg({'min': 'max', 'max': 'min'})
    )
    

    Output:

    min    07:00:00
    max    19:40:00
    dtype: object
    

    Intermediate:

    df.groupby('ID')['TIME'].agg(['min', 'max'])
             min       max
    ID                    
    1   00:40:00  20:00:00
    2   02:20:00  23:40:00
    3   00:20:00  23:40:00
    4   01:20:00  23:20:00
    5   00:00:00  22:40:00
    6   02:00:00  21:40:00
    7   00:20:00  23:20:00
    8   00:40:00  19:40:00  # min of maxima: 19:40:00
    9   00:40:00  22:40:00
    10  00:20:00  23:20:00
    11  00:00:00  22:00:00
    12  02:20:00  23:40:00
    13  01:00:00  22:40:00
    14  00:00:00  23:00:00
    15  00:00:00  23:00:00
    16  01:00:00  23:40:00
    17  00:00:00  22:40:00
    18  00:00:00  22:00:00
    19  07:00:00  23:00:00  # max of minima: 07:00:00