python sorting conditional-statements grouping

create new df column based on grouping and conditions in PYTHON

my dataframe looks like the bottom and i want to create a new column called "remove" that would be true/false based on the following conditions.

i only want to keep one row for each "url" and want to determine which to keep based on this order:

keep the one with a higher score
if score is same then go to track "gre" and "tca" are favored over "oke"
if they are equal at the track then go to "size" and take the bigger one

I would like the new column to be true or false based on if it should be removed or not.

thanks in advance!!

	name	age	track	score	size	url	url_count
0	timmy	10	gre	0.982	41	target.com	2
1	chester	12	tca	0.251	12	michaels.com	3
2	trixie	10	oke	0.0879	15	macys.com	2
3	aj	9	gre	0.985	20	michaels.com	3
4	cosmo	150	tca	0.982	328	target.com	2
5	wanda	145	gre	0.879	550	macys.com	2
6	jorgen	500	oke	0.432	51	michaels.com	3

Solution

Use sort_values with your sort criteria and a custom_sort as the key re-order your rows with the first being the highest ranked. Then we use duplicated on only the url column to flag all but the first instance of a url as to be removed:

def custom_sort(x):
    if x.name == 'track':
        return x.map({'gre': 0, 'tca': 0, 'oke': 1})
    
    else:
        return x

df['remove'] = df.sort_values(['url', 'score', 'track', 'size'], key = custom_sort, ascending = False).duplicated(subset = 'url')