Search code examples
pythonpandasfilterassignlines-of-code

Set column as yes for first Start Date


I have the following dataframe:

ID  Job     Start Date
1   Driver  1951-01-01
1   Cleaner 2000-05-01
1   Staff   1951-01-01
2   Staff   2000-05-02
2   Staff2  2000-05-01
2   Cleaner 2000-04-01
5   Driver  1951-01-01

I need to create a column named "Primary?" that has the value "Yes" or "No" depending of the Start Date

  • For each ID, set the column to "Yes" for the earliest "Start Date" for the ID. If there is 2 rows tied with same "Start Date" pick one to set to "Yes" (can be the first one that appears)

  • All other rows set to "No" for the ID

In that case, each ID would have a row with a "Yes" (just 1 single row) and all other rows to "No" resulting in this dataframe:

ID  Job     Start Date  Primary?
1   Driver  1951-01-01  Yes
1   Cleaner 2000-05-01  No
1   Staff   1951-01-01  No
2   Staff   2000-05-02  No
2   Staff2  2000-05-01  Yes
2   Cleaner 2000-04-01  Yes
5   Driver  1951-01-01  Yes

What is the best way to do it?


Solution

  • # set the primary start-date row as True/False, when its a min for the ID
    
    df['Primary']= df['Start_Date'].eq(df.groupby(['ID'])['Start_Date'].transform(min))
    
    # identify the duplicates start-dates
    df.loc[df.duplicated(subset=['ID','Primary'], keep='first'),'Primary' ] = False
    df
    # Map True/False to Yes/No
    df['Primary']=df['Primary'].map({True: 'Yes', False: 'No'})
    df
    
    
    +----+-----+----------+-------------+---------+
    |    | ID  |   Job    | Start_Date  | Primary |
    +----+-----+----------+-------------+---------+
    | 0  |  1  | Driver   | 1951-01-01  | Yes     |
    | 1  |  1  | Cleaner  | 2000-05-01  | No      |
    | 2  |  1  | Staff    | 1951-01-01  | No      |
    | 3  |  2  | Staff    | 2000-05-02  | No      |
    | 4  |  2  | Staff2   | 2000-05-01  | No      |
    | 5  |  2  | Cleaner  | 2000-04-01  | Yes     |
    | 6  |  5  | Driver   | 1951-01-01  | Yes     |
    +----+-----+----------+-------------+---------+