Search code examples
pandasdatetimeconditional-statementsapply

Pandas Create a New Column Based on Multiple Other Columns with Datetime


I am trying to create a new columns based on two other column values with one column being datetime. Below is an example of the data. I would like to get the result shown in the second table.

Site Name Date
Westwood 2022-11-15
Westwood 2022-11-16
Northend 2021-08-04
Northend 2021-08-05
Northend 2021-08-06

I'm hoping to get the below result.

Site Name Date Active_Ind
Westwood 2022-11-15 0
Westwood 2022-11-16 0
Northend 2021-08-06 0
Northend 2021-08-05 1
Northend 2021-08-04 1

Here is the what I have tried so far but I keep getting the error '<' not supported between instances of 'Timestamp' and 'int'

def conditions(df):
    if (df['Site Name']=='Northend') & (df['Date'] < 2021-08-06):
        return 1
    else:
        return 0

df['Active_Ind']=df.apply(conditions,axis=1)

Solution

  • Example

    data = {'Site Name': {0: 'Westwood', 1: 'Westwood', 2: 'Northend', 3: 'Northend', 4: 'Northend'},
            'Date': {0: '2022-11-15', 1: '2022-11-16', 2: '2021-08-04', 3: '2021-08-05', 4: '2021-08-06'}}
    df = pd.DataFrame(data)
    

    df

        Site Name   Date
    0   Westwood    2022-11-15
    1   Westwood    2022-11-16
    2   Northend    2021-08-04
    3   Northend    2021-08-05
    4   Northend    2021-08-06
    

    Code

    ordering of 1st table and 2nd table is different. I'll think of it as typo

    df['Date'] = pd.to_datetime(df['Date'])
    s = df['Site Name'].eq('Northend') & df['Date'].lt(pd.Timestamp('2021-08-06'))
    df['Active_ind'] = s.astype('int')
    

    df

        Site Name   Date        Active_ind
    0   Westwood    2022-11-15  0
    1   Westwood    2022-11-16  0
    2   Northend    2021-08-04  1
    3   Northend    2021-08-05  1
    4   Northend    2021-08-06  0