Search code examples
pythonpandasuniquemultiple-columns

Pandas: select column with most unique values


I have a pandas DataFrame and want to find select the column with the most unique values. I already filtered the unique values with nunique(). How can I now choose the column with the highest nunique()?

This is my code so far:

numeric_columns = df.select_dtypes(include = (int or float))
    unique = []
    for column in numeric_columns:
        unique.append(numeric_columns[column].nunique())

I later need to filter all the columns of my dataframe depending on this column(most uniques)


Solution

  • Use DataFrame.select_dtypes with np.number, then get DataFrame.nunique with column by maximal value by Series.idxmax:

    df = pd.DataFrame({'a':[1,2,3,4],'b':[1,2,2,2], 'c':list('abcd')})
    print (df)
       a  b  c
    0  1  1  a
    1  2  2  b
    2  3  2  c
    3  4  2  d
    
    numeric = df.select_dtypes(include = np.number)
    
    nu = numeric.nunique().idxmax()
    print (nu)
    a