I have this pandas crosstab:
I would like to reorder the column names in such a way, that it goes like [ar, cs, de, en, es, fr, hi, it, ja, ko, pl, pt, ru, tr, zh, be, gl, la, nn, sa, ur, ...] -> eg., those elemets with high numbers are at the diagonal. I tried multiindexing, rearranging those values that go into the pandas dataframe, but cannot figure it out.
I use the pd.crosstab like
orig = ['ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', ...]
pred = ['ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'tr', ...]
self.df_confusion = pd.crosstab(orig , pred )
Just reindex
on both axes:
order = ['ar', 'cs', 'de', 'en', 'es', 'fr', 'hi', 'it',
'ja', 'ko', 'pl', 'pt', 'ru', 'tr', 'zh', 'be',
'gl', 'la', 'nn', 'sa', 'ur']
df_confusion = df_confusion.reindex(index=order, columns=order)
If you want to add potentially missing values:
df_confusion = df_confusion.reindex(index=order, columns=order, fill_value=0)
Or, using a Categorical
:
df_confusion = pd.crosstab(pd.Categorical(orig, categories=order, ordered=True),
pd.Categorical(pred, categories=order, ordered=True))