Search code examples
pandasdata-analysis

A function for replacing the string with a category based on value_count


There are 290 unique values in a column namely 'Model' which contains all the model information of a car..

datano['Model'].describe(include='all')

 count        3854
 unique        290
 top       E-Class
 freq          181
 Name: Model, dtype: object

E-Class                181
Vito                   154
525                     51
Rav 4                   50
Camry                  127
Caddy                  110

There can be 3 categories namely high selling,moderate selling and low selling cars -)The models with frequency above 100 can be classified as high selling car -)frequency between 100 to 50 as moderate selling -)else low selling cars

So can a code be accommodated for the implementation of the above idea For eg-)all the cells with 'caddy' should be replaced by high selling car

Thanks...


Solution

  • You can do the following.

    df['selling'] = ''
    
    def selling_cat(x):
    
      if x.count()>100:
        return 'high'
      elif 50<x.count()<=100:
        return 'med'
      else:
        return 'low'
    
    df['selling'] = df[['selling','model']].groupby('model').transform(selling_cat)