Search code examples
pythonpandasdataframeindexingcounter

Python: Removing Rows on Count condition


I have a problem filtering a pandas dataframe.

city 
NYC 
NYC 
NYC 
NYC 
SYD 
SYD 
SEL 
SEL
...

df.city.value_counts()

I would like to remove rows of cities that has less than 4 count frequency, which would be SYD and SEL for instance.

What would be the way to do so without manually dropping them city by city?


Solution

  • Here you go with filter

    df.groupby('city').filter(lambda x : len(x)>3)
    Out[1743]: 
      city
    0  NYC
    1  NYC
    2  NYC
    3  NYC
    

    Solution two transform

    sub_df = df[df.groupby('city').city.transform('count')>3].copy() 
    # add copy for future warning when you need to modify the sub df