Search code examples
pythonpandasdataframe

Create new dataframe based on distinct combinations of column values in Python


I have a pandas dataframe:

d = {'col1': ['Date1', 'Date1', 'Date1', 'Date2', 'Date2', 'Date2', 'Date3', 'Date3', 'Date3', 'Date4', 'Date4', 'Date4'], 
     'col2': ['Date2', 'Date3', 'Date4', 'Date1', 'Date3', 'Date4', 'Date1', 'Date2', 'Date4', 'Date1', 'Date2', 'Date3']}
df = pd.DataFrame(data=d)

enter image description here

How do I get a unique list of combinations of the values in the columns, like this?

enter image description here

I have tried nested for loops using df.itertuples() and df.drop() and am just getting lost.


Solution

  • You can sort the col1/col2 and then drop duplicates:

    df["tmp"] = df[["col1", "col2"]].apply(sorted, axis=1)
    df = df.drop_duplicates(subset="tmp").drop(columns="tmp")
    
    print(df)
    

    Prints:

        col1   col2
    0  Date1  Date2
    1  Date1  Date3
    2  Date1  Date4
    4  Date2  Date3
    5  Date2  Date4
    8  Date3  Date4