Search code examples
pythonpandascountdelete-rowdistinct-values

delete entire row from df based on the counting of the column's value


I have the following df:

d = {'animal': ['lion', 'dog', 'cat', 'lion', 'shark', 'cat', 'lion', 'shark'], 'age': [3, 4, 9, 10, 8, 5, 8, 9]}

df_1 = pd.DataFrame(data=d)

enter image description here

My goal is:

enter image description here

In other words, remove the entire row from df if the value from the 'animal' column repeats 3 times or more. In this case: (lion:3, shark:2, cat:2, dog:1) -- lion removed

How do I approach this problem? I'm iterating but I'm stuck. Is there any series method? How to approach?


Solution

  • Try:

    m=df_1['animal'].value_counts().ge(3)
    #create a condition to check if the count of particular value is greater then or eq to 3 or not
    

    Finally:

    out=df_1[~df_1['animal'].isin(m[m].index)]
    #Finally Filter out result
    

    Output of out:

        animal  age
    1   dog     4
    2   cat     9
    4   shark   8
    5   cat     5
    7   shark   9
    

    If needed use reset_index() method:

    out=out.reset_index(drop=True)