Search code examples
pythonpandasdataframefilteringmulti-index

Drop duplicates of permuted multi-index


I have a dataframe as follows:

           Correlations       adf
FITB RF        0.984395 -5.484766
WAT  SWK       0.981778 -5.465284
SWK  WAT       0.981778 -5.420976
RF   FITB      0.984395 -5.175268
MCO  BK        0.973801 -4.919812

and I want to filter through the indices such that the dataframe drops any repeated values, even if they are permuted. Therefore, the above dataframe would become

           Correlations       adf
FITB RF        0.984395 -5.484766
WAT  SWK       0.981778 -5.465284
MCO  BK        0.973801 -4.919812

I can't find an efficient means of doing this for a large dataframe. Any help is much appreciated!


Solution

  • You can leverage np.sort + pd.DataFrame.duplicated:

    m = pd.DataFrame(np.sort(df.index.tolist(), axis=1)).duplicated()
    df[~(m.values)]
    
              Correlations       adf
    FITB RF       0.984395 -5.484766
    WAT  SWK      0.981778 -5.465284
    MCO  BK       0.973801 -4.919812
    

    Or, in a similar fashion, using pd.MultiIndex.duplicated:

    m = pd.MultiIndex.from_tuples(
        [tuple(x) for x in np.sort(df.index.tolist(), axis=1)]
    ).duplicated()
    df[~m]
    
    
              Correlations       adf
    FITB RF       0.984395 -5.484766
    WAT  SWK      0.981778 -5.465284
    MCO  BK       0.973801 -4.919812