I have a dataframe like this:
ID Date Main_Date
0 1 2015-04-02 12:00:00 False
1 1 2015-05-02 15:00:00 True
2 1 2015-02-15 00:00:00 False
3 2 2015-03-06 00:00:00 True
4 2 2016-11-20 00:00:00 False
5 3 2015-04-10 00:00:00 True
6 3 2016-07-28 00:00:00 False
7 3 2018-05-21 00:00:00 False
I want to add a column with True/False
values if a Date
is in the range 1 month before or after the Main_Date
for every 'ID'
group , so my expected output is:
ID Date Main_Date Within_Range
0 1 2015-04-02 12:00:00 False True
1 1 2015-05-02 15:00:00 True True
2 1 2015-02-15 00:00:00 False False
3 2 2015-03-06 00:00:00 True True
4 2 2016-11-20 00:00:00 False False
5 3 2015-04-10 00:00:00 True True
6 3 2016-07-28 00:00:00 False False
7 3 2018-05-21 00:00:00 False False
I wrote a function for this:
def check_if_within_range(date):
# Select the `Main_Date` for the `ID` group
main_date = df.loc[df['Main_Date'] == True, 'Date'].iloc[0] #This line causes the problem
# Specify the range of dates as all dates 1 month before and after the `Main_Date`
month_start = main_date - datetime.timedelta(days=30)
month_end = main_date + datetime.timedelta(days=30)
month = pd.date_range(month_start, month_end)
# Check if `Date` is within range
return (date > month_start) & (date <= month_end)
df['Within_Range'] = df['Date'].apply(lambda x:check_if_within_range(x))
print(df)
But I could not figure out how to select the Main_Date
for every ID
group. As it is now the first line in the function takes the first Main_Date
.
Also the way I add the days results in 2015-04-02 15:00:00
being the month_start
for the Main_Date
2015-05-02 15:00:00
so that 2015-04-02 12:00:00
is considered to be out of range. So I would like to specify the month_start
in such a way that it covers the whole day.
This is the false output I get with the function I wrote:
ID Date Main_Date Within_Range
0 1 2015-04-02 12:00:00 False False
1 1 2015-05-02 15:00:00 True True
2 1 2015-02-15 00:00:00 False False
3 2 2015-03-06 00:00:00 True False
4 2 2016-11-20 00:00:00 False False
5 3 2015-04-10 00:00:00 True True
6 3 2016-07-28 00:00:00 False False
7 3 2018-05-21 00:00:00 False False
I also do not know if this is a good way to do it or if it could be done much easier as I am new to Python, so I am open to any suggestions.
import pandas as pd
data = {
"ID": [1, 1, 1, 2, 2, 3, 3, 3],
"Date": pd.to_datetime([
"2015-04-02 12:00:00", "2015-05-02 15:00:00", "2015-02-15 00:00:00",
"2015-03-06 00:00:00", "2016-11-20 00:00:00", "2015-04-10 00:00:00",
"2016-07-28 00:00:00", "2018-05-21 00:00:00"
]),
"Main_Date": [False, True, False, True, False, True, False, False]
}
df = pd.DataFrame(data)
def check_if_within_range(group):
main_date = group.loc[group['Main_Date'] == True, 'Date'].iloc[0]
main_date = main_date.normalize()
month_start = main_date - pd.DateOffset(months=1)
month_end = main_date + pd.DateOffset(months=1)
group['Within_Range'] = (group['Date'] >= month_start) & (group['Date'] <= month_end)
return group
df = df.groupby('ID').apply(check_if_within_range)
print(df)