Search code examples
pythonpandasdataframepivot-table

How to sort column and row names in pandas crosstab?


I have this pandas crosstab:

enter image description here

I would like to reorder the column names in such a way, that it goes like [ar, cs, de, en, es, fr, hi, it, ja, ko, pl, pt, ru, tr, zh, be, gl, la, nn, sa, ur, ...] -> eg., those elemets with high numbers are at the diagonal. I tried multiindexing, rearranging those values that go into the pandas dataframe, but cannot figure it out.

I use the pd.crosstab like

orig = ['ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', ...]
pred = ['ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'ar', 'tr', ...]
self.df_confusion = pd.crosstab(orig , pred )

Solution

  • Just reindex on both axes:

    order = ['ar', 'cs', 'de', 'en', 'es', 'fr', 'hi', 'it',
             'ja', 'ko', 'pl', 'pt', 'ru', 'tr', 'zh', 'be',
             'gl', 'la', 'nn', 'sa', 'ur']
    
    df_confusion = df_confusion.reindex(index=order, columns=order)
    

    If you want to add potentially missing values:

    df_confusion = df_confusion.reindex(index=order, columns=order, fill_value=0)
    

    Or, using a Categorical:

    df_confusion = pd.crosstab(pd.Categorical(orig, categories=order, ordered=True),
                               pd.Categorical(pred, categories=order, ordered=True))