Sort Pandas columns by number of unique groups

I'm attempting to sort a section of the categorical data from the kaggle Housing Prices dataset. I'm attempting to sort the columns by the number of unique categories. To avoid cluttering the question, I've included a small section of the dataframe (80 total columns) to demonstrate the problem.

    LotConfig    Street    MSZoning
0   Inside       Pave      RL
1   FR2          Pave      RL
2   Corner       Grvl      RL

I'm aiming to sort the columns to achieve the following result.

    MSZoning    Street    LotConfig    
0   RL          Pave      Inside       
1   RL          Pave      FR2          
2   RL          Grvl      Corner

I'm currently using the following code.

sorted_categories =  sorted(df.columns, key = lambda col: df[col].nunique())
df = df[sorted_categories]

Is there a better way to solve this problem? Am I missing a pandas method to achieve this?

Solution

You can call nunique directly and index your initial DataFrame using the result:

u = df.nunique().sort_values().index
df[u]

  MSZoning Street LotConfig
0       RL   Pave    Inside
1       RL   Pave       FR2
2       RL   Grvl    Corner

df.nunique() will return a Series of unique values per column.

>>> df.nunique()
LotConfig    3
Street       2
MSZoning     1
dtype: int64

However, on your small example, your approach is actually faster than mine, albeit a bit more verbose. I would test this on your actual DataFrame, since your method isn't incorrect, and if it improves performance, I would go with that instead. I believe my method should be faster on a large frame since it avoids calling nunique many times.