Search code examples
python-3.xpandasmultiple-columnscategorical-datadummy-variable

How to create dummy for levels above a certain threshold of a column with high cardinality?


so i have this column with high cardinality :

   Df['Education_Degree'].value_counts():

   Masters Degree in Mathematics                      5550
   Bachelors Degree in Physics                        4420
   Bacherlors Degree                                  3210
   Masters Degree in Mechanics                        2540
   Masters Degree                                     1200
   Masters Degree in Economics                        995
   .
   .
   .

   Name: Education_Degree, Length: 356, dtype: int64

And What I want to do is to create dummy columns but only for levels above 995 , Any Suggestion Would Be Much Appreciated , Thank you


Solution

  • In your case

    s=Df['Education_Degree'].value_counts()
    sdumm=pd.get_dummies(Df.loc[Df['Education_Degree'].isin(s.index[s>=995]),'Education_Degree'])
    

    Then just concat

    yourdf=pd.concat([Df,sdumm.reindex(Df.index).fillna(0)],axis=1)