Search code examples
pythonpandasdataframeunique

Pandas get unique values in one column based off of another column python


Here I have a dataframe like the following:

Variable    Groups
1           [0-10]
1           [0-10]
2           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
4           [10-20]
5           [10-20]
5           [10-20]

I'd like to get only unique values for Variable column, but don't want to lose any duplicates that are in different Groups, so for example this:

Variable    Groups
1           [0-10]
2           [0-10]
3           [0-10]
3           [10-20]
4           [10-20]
5           [10-20]

Note there is still a duplicate 3 because there was one in each group. I've tried

df_unique = df['Groups'].groupby(df['Variable']).unique().apply(pd.Series)

but this is just returning a complete mess. Not sure what to do, help appreciated.


Solution

  • You can use SeriesGroupBy.unique() together with .explode() and .reset_index(), as follows:

    df.groupby('Variable')['Groups'].unique().explode().reset_index()
    

    Another solution is to use GroupBy.first(), as follows:

    df.groupby(['Variable', 'Groups'], as_index=False).first()
    

    Result:

       Variable   Groups
    0         1   [0-10]
    1         2   [0-10]
    2         3   [0-10]
    3         3  [10-20]
    4         4  [10-20]
    5         5  [10-20]