Lets' say I have a dataframe with 3 columns, one containing the groups, and I would to collect the collections of values in the 2 other columns for each group.
Normally I would use the pandas.groupby function and apply the unique method. Well this does not work if unique is applied on more than 1 column...
df = pd.DataFrame({
'group': [1, 1, 2, 3, 3, 3, 4],
'param1': [1, 5, 8, np.nan, 2, 3, np.nan],
'param2': [5,6,9,10,11,12,1]
})
Apply unique on 1 column:
df.groupby('group')['param1'].unique()
group
1 [1.0, 5.0]
2 [8.0]
3 [nan, 2.0, 3.0]
4 [nan]
Name: param1, dtype: object
Apply unique on 2 columns:
df.groupby('group')[['param1', 'param2']].unique()
I get an AttributeError:
AttributeError: 'DataFrameGroupBy' object has no attribute 'unique'
Instead I would expect this dataframe:
param1 param2
group
1 [1.0, 5.0] [5, 6]
2 [8.0] [9]
3 [nan, 2.0, 3.0] [10,11,12]
4 [nan] [1]
Reason of error is unique
working only for Series
, so is only implemented SeriesGroupBy.unique
.
For me working Series.unique
with convert to list:
df = df.groupby('group')[['param1', 'param2']].agg(lambda x: list(x.unique()))
print (df)
param1 param2
group
1 [1.0, 5.0] [5, 6]
2 [8.0] [9]
3 [nan, 2.0, 3.0] [10, 11, 12]
4 [nan] [1]