python pandas loops duplicates data-cleaning

Keep populated duplicate rows while removing those with NULL field

I have a dataframe in Python with 3 columns with some duplicate rows. Within the duplicate rows, all fields are the same except for one (which is null).

Example data:

ID	Country	Trades
P11	France	5
P12	Germany	3
P13	UK	7
P13	NULL	7
P14	Croatia	13
P15	USA	5
P15	NULL	5
P16	UK	2
P16	NULL	2

I want to remove the duplicate that contain the NULL value, while keeping the one that has no NULLs in the 'Country' column. Note: it is not only the top duplicate row that has the data while the bottom contains the NULL, sometimes this inverses (top duplicate contains the NULL while bottom duplicate has the country data).

Hoping to clean dataframe down to get this:

ID	Country	Trades
P11	France	5
P12	Germany	3
P13	UK	7
P14	Croatia	13
P15	USA	5
P16	UK	2

Where the duplicates have been removed with no data loss (in the Country column). Many thanks in advance you legends!

Solution

Hello and welcome vegeratian_python. Here's an example of how to solve it:

# Sort by 'Country' and put NAs always first
df_sorted = df.sort_values(by='Country', na_position='last')

# Drop duplicates keeping the last occurrence
df_cleaned = df_sorted.drop_duplicates(subset=['ID'], keep='last')

print(df_cleaned)