I am trying to create a new columns based on two other column values with one column being datetime. Below is an example of the data. I would like to get the result shown in the second table.
Site Name | Date |
---|---|
Westwood | 2022-11-15 |
Westwood | 2022-11-16 |
Northend | 2021-08-04 |
Northend | 2021-08-05 |
Northend | 2021-08-06 |
I'm hoping to get the below result.
Site Name | Date | Active_Ind |
---|---|---|
Westwood | 2022-11-15 | 0 |
Westwood | 2022-11-16 | 0 |
Northend | 2021-08-06 | 0 |
Northend | 2021-08-05 | 1 |
Northend | 2021-08-04 | 1 |
Here is the what I have tried so far but I keep getting the error '<' not supported between instances of 'Timestamp' and 'int'
def conditions(df):
if (df['Site Name']=='Northend') & (df['Date'] < 2021-08-06):
return 1
else:
return 0
df['Active_Ind']=df.apply(conditions,axis=1)
Example
data = {'Site Name': {0: 'Westwood', 1: 'Westwood', 2: 'Northend', 3: 'Northend', 4: 'Northend'},
'Date': {0: '2022-11-15', 1: '2022-11-16', 2: '2021-08-04', 3: '2021-08-05', 4: '2021-08-06'}}
df = pd.DataFrame(data)
df
Site Name Date
0 Westwood 2022-11-15
1 Westwood 2022-11-16
2 Northend 2021-08-04
3 Northend 2021-08-05
4 Northend 2021-08-06
Code
ordering of 1st table and 2nd table is different. I'll think of it as typo
df['Date'] = pd.to_datetime(df['Date'])
s = df['Site Name'].eq('Northend') & df['Date'].lt(pd.Timestamp('2021-08-06'))
df['Active_ind'] = s.astype('int')
df
Site Name Date Active_ind
0 Westwood 2022-11-15 0
1 Westwood 2022-11-16 0
2 Northend 2021-08-04 1
3 Northend 2021-08-05 1
4 Northend 2021-08-06 0