I want to remove duplicates in each row for the column animals.
I need something like this post, but in python. I cannot figure this out right now for some reason and I am hitting a block.
Remove duplicate records in dataframe
I have tried using drop duplicates, unique, nunique, etc. No luck.
df.drop_duplicates(subset=None, keep="first", inplace=False) df
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
#input:
animals
0 pink pig, pink pig, pink pig
1 brown cow, brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow, brown cow
#I would like the output to look like this:
animals
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown horse, pink pig, brown cow, black cow
This does it:
df = pd.DataFrame ({'animals':['pink pig, pink pig, pink pig','brown cow, brown cow','pink pig, black cow','brown horse, pink pig, brown cow, black cow, brown cow']})
df['animals2'] = df.animals.apply(lambda x: ', '.join(list(set(x.split(', ')))))
Output:
0 pink pig
1 brown cow
2 pink pig, black cow
3 brown cow, brown horse, pink pig, black cow
Explanation:
I turned your strings into a list. Then I turned the list into a set to remove duplicates. Then I turned the set into a list, and the I split the list turning it into a string again. Please tell me if something isn't clear!