python pandas dataframe group-by conditional-statements

Pandas - Conditional grouping

I have the following DataFrame containing employment history: (1900-01-01 is used to store a null end date)

id	company	job_title	start_date	end_date	is_current_employer
A	Google	Manager	2023-05-06	1900-01-01	1
A	Amazon	SWE	2021-01-19	2023-02-02	0
A	Meta	SWE	2017-11-13	2020-10-24	0
B	Tesla	Research	2020-06-06	2023-07-14	0
B	Microsoft	Data Eng.	2017-04-17	2019-12-05	0
C	Adobe	Intern	2022-10-12	1900-01-01	1
C	TikTok	Intern	2023-03-23	1900-01-01	1

I'm hoping to return a DataFrame with one record for each ID, with that ID's most recent employment, like so:

id	company	job_title	start_date	end_date	is_current_employer
A	Google	Manager	2023-05-06	1900-01-01	1
B	Tesla	Research	2020-06-06	2023-07-14	0
C	Adobe	Intern	2022-10-12	1900-01-01	1

If a candidate (ID) has only one current employer, that record should be returned. If a candidate has no current employers, the most recent employer should be returned (max start date). If a candidate has multiple current employers, the record with the earlier start date should be returned.

How can I concisely do this in Python?

Solution

You can use group by and first() to do this. First, you must sort by start date to get the most recent employment:

df.sort_values(["start_date"], ascending=False).groupby("id").first()

Output:

id	company	job_title	start_date	end_date	is_current_employer
A	Google	Manager	2023-05-06	1900-01-01	1
B	Tesla	Research	2020-06-06	2023-07-14	0
C	TikTok	Intern	2023-03-23	1900-01-01	1