Search code examples
pythonpandasnumpycomparison

Find columns with the same values rearranged


I want to find values in two columns that differ only in the rearrangement of values between columns. And where matching values are found (for example: 3-b / b-3 and a-3 / 3-a) in case of finding the second event - put down the unit. It is necessary in order to be able to exclude duplicates from the data frame. It is desirable without loops, since there are a lot of rows

import numpy as np


table = pd.DataFrame({'id_1': [a, 2, 2, b, 3, 3],
                  'id_2': [3, 4, 5, 3, b, a],})

Result_table=pd.DataFrame({'id_1': [1, 2, 2, 2, 3, 3],
          'id_2': [3, 4, 5, 3, 2, 1],
          'Result':[0, 0, 0, 0, 1, 1]})

>>> Result_table 
  id_1  id_2    Result
0   a   3   0
1   b   4   0
2   b   5   0
3   b   3   0
4   3   b   1
5   3   a   1

Solution

  • You can create Series object with applying frozenset on rows and group them

    >>> df
       id_1  id_2
    0     1     3
    1     2     4
    2     2     5
    3     2     3
    4     3     2
    5     3     1
    >>> df["Result"] = df.groupby(df.agg(frozenset, axis=1)).cumcount()
    >>> df
       id_1  id_2  Result
    0     1     3       0
    1     2     4       0
    2     2     5       0
    3     2     3       0
    4     3     2       1
    5     3     1       1