I have a dataframe as follows:
Correlations adf
FITB RF 0.984395 -5.484766
WAT SWK 0.981778 -5.465284
SWK WAT 0.981778 -5.420976
RF FITB 0.984395 -5.175268
MCO BK 0.973801 -4.919812
and I want to filter through the indices such that the dataframe drops any repeated values, even if they are permuted. Therefore, the above dataframe would become
Correlations adf
FITB RF 0.984395 -5.484766
WAT SWK 0.981778 -5.465284
MCO BK 0.973801 -4.919812
I can't find an efficient means of doing this for a large dataframe. Any help is much appreciated!
You can leverage np.sort
+ pd.DataFrame.duplicated
:
m = pd.DataFrame(np.sort(df.index.tolist(), axis=1)).duplicated()
df[~(m.values)]
Correlations adf
FITB RF 0.984395 -5.484766
WAT SWK 0.981778 -5.465284
MCO BK 0.973801 -4.919812
Or, in a similar fashion, using pd.MultiIndex.duplicated
:
m = pd.MultiIndex.from_tuples(
[tuple(x) for x in np.sort(df.index.tolist(), axis=1)]
).duplicated()
df[~m]
Correlations adf
FITB RF 0.984395 -5.484766
WAT SWK 0.981778 -5.465284
MCO BK 0.973801 -4.919812