Search code examples
python-3.xpandasdataframegroup-byaggregate-functions

Python: How to use value_counts() inside .agg function in pandas?


Input dataframe df looks like:

item    row
Apple   12
Apple   12
Apple   13
Orange  13
Orange  14
Lemon   14

Output dataframe need to be

item    unique_row      nunique_row     count
Apple   {12,13}             2            {2,1}
Orange  {13,14}             2            {1,1}
Lemon   {14}                1            {1}

Tried Code:

df.groupby('item', as_index=False)['row'].agg({'unique_row': lambda x: set(x)
                                                ,'nunique_row': lambda x: len(set(x))})

So here, not sure how to add condition inside .agg function to generate column 'count'. Column 'count' represents number of value_count for each row value. Any help will be appreciated. Thank You!


Solution

  • You need to convert to list or set:

    (df.groupby('item', as_index=False)['row']
       .agg({'unique_row': lambda x: list(x.unique()),
             'nunique_row': lambda x: len(set(x)),
             'count': lambda x: list(x.value_counts(sort=False)), # or set(x.value_counts())
            })
    )
    

    output:

         item unique_row  nunique_row   count
    0   Apple   [12, 13]            2  [2, 1]
    1   Lemon       [14]            1     [1]
    2  Orange   [13, 14]            2  [1, 1]