A function for replacing the string with a category based on value_count

There are 290 unique values in a column namely 'Model' which contains all the model information of a car..

datano['Model'].describe(include='all')

 count        3854
 unique        290
 top       E-Class
 freq          181
 Name: Model, dtype: object

E-Class                181
Vito                   154
525                     51
Rav 4                   50
Camry                  127
Caddy                  110

There can be 3 categories namely high selling,moderate selling and low selling cars -)The models with frequency above 100 can be classified as high selling car -)frequency between 100 to 50 as moderate selling -)else low selling cars

So can a code be accommodated for the implementation of the above idea For eg-)all the cells with 'caddy' should be replaced by high selling car

Thanks...

Solution

You can do the following.

df['selling'] = ''

def selling_cat(x):

  if x.count()>100:
    return 'high'
  elif 50<x.count()<=100:
    return 'med'
  else:
    return 'low'

df['selling'] = df[['selling','model']].groupby('model').transform(selling_cat)