I have the following df:
d = {'animal': ['lion', 'dog', 'cat', 'lion', 'shark', 'cat', 'lion', 'shark'], 'age': [3, 4, 9, 10, 8, 5, 8, 9]}
df_1 = pd.DataFrame(data=d)
My goal is:
In other words, remove the entire row from df if the value from the 'animal' column repeats 3 times or more. In this case: (lion:3, shark:2, cat:2, dog:1) -- lion removed
How do I approach this problem? I'm iterating but I'm stuck. Is there any series method? How to approach?
Try:
m=df_1['animal'].value_counts().ge(3)
#create a condition to check if the count of particular value is greater then or eq to 3 or not
Finally:
out=df_1[~df_1['animal'].isin(m[m].index)]
#Finally Filter out result
Output of out
:
animal age
1 dog 4
2 cat 9
4 shark 8
5 cat 5
7 shark 9
If needed use reset_index()
method:
out=out.reset_index(drop=True)