Search code examples
pythonpandasdategroupingtimedelta

Label n days before and after a specific date for groups


I have a dataframe like this:

   ID                Date  Main_Date
0   1 2015-04-02 12:00:00      False
1   1 2015-05-02 15:00:00       True
2   1 2015-02-15 00:00:00      False
3   2 2015-03-06 00:00:00       True
4   2 2016-11-20 00:00:00      False
5   3 2015-04-10 00:00:00       True
6   3 2016-07-28 00:00:00      False
7   3 2018-05-21 00:00:00      False

I want to add a column with True/False values if a Date is in the range 1 month before or after the Main_Date for every 'ID' group , so my expected output is:

   ID                Date  Main_Date  Within_Range
0   1 2015-04-02 12:00:00      False          True
1   1 2015-05-02 15:00:00       True          True
2   1 2015-02-15 00:00:00      False         False
3   2 2015-03-06 00:00:00       True          True
4   2 2016-11-20 00:00:00      False         False
5   3 2015-04-10 00:00:00       True          True
6   3 2016-07-28 00:00:00      False         False
7   3 2018-05-21 00:00:00      False         False

I wrote a function for this:

def check_if_within_range(date):
        # Select the `Main_Date` for the `ID` group 
        main_date = df.loc[df['Main_Date'] == True, 'Date'].iloc[0] #This line causes the problem
        # Specify the range of dates as all dates 1 month before and after the `Main_Date`            
        month_start = main_date - datetime.timedelta(days=30)
        month_end = main_date + datetime.timedelta(days=30)
        month = pd.date_range(month_start, month_end)
        # Check if `Date` is within range            
        return (date > month_start) & (date <= month_end)

df['Within_Range'] = df['Date'].apply(lambda x:check_if_within_range(x))
print(df)

But I could not figure out how to select the Main_Date for every ID group. As it is now the first line in the function takes the first Main_Date.

Also the way I add the days results in 2015-04-02 15:00:00 being the month_start for the Main_Date 2015-05-02 15:00:00 so that 2015-04-02 12:00:00 is considered to be out of range. So I would like to specify the month_start in such a way that it covers the whole day.

This is the false output I get with the function I wrote:

   ID                Date  Main_Date  Within_Range
0   1 2015-04-02 12:00:00      False         False
1   1 2015-05-02 15:00:00       True          True
2   1 2015-02-15 00:00:00      False         False
3   2 2015-03-06 00:00:00       True         False
4   2 2016-11-20 00:00:00      False         False
5   3 2015-04-10 00:00:00       True          True
6   3 2016-07-28 00:00:00      False         False
7   3 2018-05-21 00:00:00      False         False

I also do not know if this is a good way to do it or if it could be done much easier as I am new to Python, so I am open to any suggestions.


Solution

  • import pandas as pd
    
    
    data = {
        "ID": [1, 1, 1, 2, 2, 3, 3, 3],
        "Date": pd.to_datetime([
            "2015-04-02 12:00:00", "2015-05-02 15:00:00", "2015-02-15 00:00:00", 
            "2015-03-06 00:00:00", "2016-11-20 00:00:00", "2015-04-10 00:00:00", 
            "2016-07-28 00:00:00", "2018-05-21 00:00:00"
        ]),
        "Main_Date": [False, True, False, True, False, True, False, False]
    }
    
    df = pd.DataFrame(data)
    
    def check_if_within_range(group):
        main_date = group.loc[group['Main_Date'] == True, 'Date'].iloc[0]
        main_date = main_date.normalize()
        month_start = main_date - pd.DateOffset(months=1)
        month_end = main_date + pd.DateOffset(months=1)
        group['Within_Range'] = (group['Date'] >= month_start) & (group['Date'] <= month_end)
        return group
    
    df = df.groupby('ID').apply(check_if_within_range)
    
    print(df)
    

    enter image description here