Search code examples
pythonpandascountuniquedistinct

Python Pandas: Group by and count distinct value over all columns?


I have df

      column1  column2  column3  column4
0    name        True        True         NaN
1    name        NaN        True         NaN
2   name1        NaN        True         True 
3   name1        True        True       True 

and I would like to Group by and count distinct value over all columnsI am trying :

df.groupby('column1').nunique()

but I am receiving this error.

AttributeError: 'DataFrameGroupBy' object has no attribute 'nunique'

Anybody have a suggestion?


Solution

  • You can use stack for Series and then Series.groupby with SeriesGroupBy.nunique:

    df1 = df.set_index('column1').stack()
    
    print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack())
    

    Sample:

    print (df)
      column1 column2 column3 column4
    0    name    True    True     NaN
    1    name     NaN    True     NaN
    2   name1     NaN    True    True
    3   name1    True    True    True
    
    df1 = df.set_index('column1').stack(dropna=False)
    print (df1)
    column1         
    name     column2    True
             column3    True
             column4     NaN
             column2     NaN
             column3    True
             column4     NaN
    name1    column2     NaN
             column3    True
             column4    True
             column2    True
             column3    True
             column4    True
    dtype: object
    
    print (df1.groupby(level=[0,1]).nunique(dropna=False).unstack(fill_value=0))
             column2  column3  column4
    column1                           
    name           2        1        1
    name1          2        1        1
    
    print (df1.groupby(level=[0,1]).nunique().unstack(fill_value=0))
             column2  column3  column4
    column1                           
    name           1        1        0
    name1          1        1        1
    

    Another solution with double apply:

    print (df.groupby('column1')
             .apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique(dropna=False))))
             column2  column3  column4
    column1                           
    name           2        1        1
    name1          2        1        1
    
    print (df.groupby('column1').apply(lambda x: x.iloc[:,1:].apply(lambda y: y.nunique())))
             column2  column3  column4
    column1                           
    name           1        1        0
    name1          1        1        1