Here I have a dataframe like the following:
Variable Groups
1 [0-10]
1 [0-10]
2 [0-10]
2 [0-10]
3 [0-10]
3 [10-20]
4 [10-20]
4 [10-20]
5 [10-20]
5 [10-20]
I'd like to get only unique values for Variable
column, but don't want to lose any duplicates that are in different Groups
, so for example this:
Variable Groups
1 [0-10]
2 [0-10]
3 [0-10]
3 [10-20]
4 [10-20]
5 [10-20]
Note there is still a duplicate 3 because there was one in each group. I've tried
df_unique = df['Groups'].groupby(df['Variable']).unique().apply(pd.Series)
but this is just returning a complete mess. Not sure what to do, help appreciated.
You can use SeriesGroupBy.unique()
together with .explode()
and .reset_index()
, as follows:
df.groupby('Variable')['Groups'].unique().explode().reset_index()
Another solution is to use GroupBy.first()
, as follows:
df.groupby(['Variable', 'Groups'], as_index=False).first()
Result:
Variable Groups
0 1 [0-10]
1 2 [0-10]
2 3 [0-10]
3 3 [10-20]
4 4 [10-20]
5 5 [10-20]