Search code examples
pythonpandasdataframepermute

How to combine pandas df so that rows with permuted col1 and col2 values can be merged containing only one combination & summing a count column


I want to combine the permuted col1 and col2 values into one row. containing only the first combination & summing the count column of both. Is there an easy way to do this in pandas?

The example data frame and output: For example, in the below data frame I want to combine rows with values A, B, and B, A and sum their count column. Same for rows with values C, D, and D, C and sum their count values as well. I want to keep the rest of the rows in the data frame as is.

INPUT:

col1 col2 count
A B 3
C D 2
B A 5
E F 2
G H 8
D C 5
I J 4

OUTPUT:

col1 col2 count
A B 8
C D 7
E F 2
G H 8
I J 4

Solution

  • You can .groupby according sorted col1/col2:

    x = (
        df.groupby(df[["col1", "col2"]].apply(lambda x: tuple(sorted(x)), 1))
        .agg({"col1": "first", "col2": "first", "count": "sum"})
        .reset_index(drop=True)
    )
    print(x)
    

    Prints:

      col1 col2  count
    0    A    B      8
    1    C    D      7
    2    E    F      2
    3    G    H      8
    4    I    J      4