python pandas filter assign lines-of-code

Set column as yes for first Start Date

I have the following dataframe:

ID  Job     Start Date
1   Driver  1951-01-01
1   Cleaner 2000-05-01
1   Staff   1951-01-01
2   Staff   2000-05-02
2   Staff2  2000-05-01
2   Cleaner 2000-04-01
5   Driver  1951-01-01

I need to create a column named "Primary?" that has the value "Yes" or "No" depending of the Start Date

For each ID, set the column to "Yes" for the earliest "Start Date" for the ID. If there is 2 rows tied with same "Start Date" pick one to set to "Yes" (can be the first one that appears)
All other rows set to "No" for the ID

In that case, each ID would have a row with a "Yes" (just 1 single row) and all other rows to "No" resulting in this dataframe:

ID  Job     Start Date  Primary?
1   Driver  1951-01-01  Yes
1   Cleaner 2000-05-01  No
1   Staff   1951-01-01  No
2   Staff   2000-05-02  No
2   Staff2  2000-05-01  Yes
2   Cleaner 2000-04-01  Yes
5   Driver  1951-01-01  Yes

What is the best way to do it?

Solution

# set the primary start-date row as True/False, when its a min for the ID

df['Primary']= df['Start_Date'].eq(df.groupby(['ID'])['Start_Date'].transform(min))

# identify the duplicates start-dates
df.loc[df.duplicated(subset=['ID','Primary'], keep='first'),'Primary' ] = False
df
# Map True/False to Yes/No
df['Primary']=df['Primary'].map({True: 'Yes', False: 'No'})
df

+----+-----+----------+-------------+---------+
|    | ID  |   Job    | Start_Date  | Primary |
+----+-----+----------+-------------+---------+
| 0  |  1  | Driver   | 1951-01-01  | Yes     |
| 1  |  1  | Cleaner  | 2000-05-01  | No      |
| 2  |  1  | Staff    | 1951-01-01  | No      |
| 3  |  2  | Staff    | 2000-05-02  | No      |
| 4  |  2  | Staff2   | 2000-05-01  | No      |
| 5  |  2  | Cleaner  | 2000-04-01  | Yes     |
| 6  |  5  | Driver   | 1951-01-01  | Yes     |
+----+-----+----------+-------------+---------+