Search code examples
pandaspandas-groupbypython-3.7

Pandas unique doest not work on groupby object when applied on several columns


Lets' say I have a dataframe with 3 columns, one containing the groups, and I would to collect the collections of values in the 2 other columns for each group.

Normally I would use the pandas.groupby function and apply the unique method. Well this does not work if unique is applied on more than 1 column...

df = pd.DataFrame({
    'group': [1, 1, 2, 3, 3, 3, 4],
    'param1': [1, 5, 8, np.nan, 2, 3, np.nan],
    'param2': [5,6,9,10,11,12,1]
})

Apply unique on 1 column:

df.groupby('group')['param1'].unique()
group
1         [1.0, 5.0]
2              [8.0]
3    [nan, 2.0, 3.0]
4              [nan]
Name: param1, dtype: object

Apply unique on 2 columns:

df.groupby('group')[['param1', 'param2']].unique()

I get an AttributeError:

AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'

Instead I would expect this dataframe:


    param1  param2
group       
1   [1.0, 5.0]  [5, 6]
2   [8.0]   [9]
3   [nan, 2.0, 3.0]     [10,11,12]
4   [nan]   [1]

Solution

  • Reason of error is unique working only for Series, so is only implemented SeriesGroupBy.unique.


    For me working Series.unique with convert to list:

    df = df.groupby('group')[['param1', 'param2']].agg(lambda x: list(x.unique()))
    print (df)
                    param1        param2
    group                               
    1           [1.0, 5.0]        [5, 6]
    2                [8.0]           [9]
    3      [nan, 2.0, 3.0]  [10, 11, 12]
    4                [nan]           [1]