Search code examples
pandasunique

How to find column names in pandas dataframe that contain all unique values except NaN?


I want to find columns that contain all non-duplicates from a pandas data frame except NaN.

   x   y   z
a  1   2   A
b  2   2   B
c  NaN 3   D
d  4   NaN NaN
e  NaN NaN NaN 

The columns "x" and "z" have non-duplicate values except NaN, so I want to pick them out and create a new data frame.


Solution

  • Let us use nunique

    m=df.nunique()==df.notnull().sum()
    subdf=df.loc[:,m]
         x    z
    a  1.0    A
    b  2.0    B
    c  NaN    D
    d  4.0  NaN
    e  NaN  NaN
    
    m.index[m].tolist()
    ['x', 'z']