Search code examples
pythonpandasdataframeconcatenation

Concat multiple dataframes in Python reorders the columns by alphabet


I have NOT a problem but a behaviour I would like to understand.

I have an Excel with 12 sheets, I want them all combined in 1 dataframe, with the sheetname as extra column. I can do it but have 1 strange behaviour I can't explain : the final order of columns of the dataframe is changed into and alphabetical order.

This is my flow : I read through 10 sheets of an excel.
Every sheet is read in a dataframe, all sheets have the same shape (73 rows x 50 columns) and the same hierarchical column. (2 levels). When a sheet is read in, I append it to a dataframe, then I read the next sheet etc..

The result is a dataframe of 876rows and 50 columns. That is good. But - as mentioned - with a different column order than the original set.

WORKAROUND : I save the order in a list and reapply it at the final dataframe, but that doesn't explain the behaviour.

If I try to replay this with a sample code (see underneath) but I cannot replay the behaviour, my sample code works as expected and preserves the order. Any idea how I can start to investigate this behaviour?

import pandas as pd
import numpy as np

t0=pd.DataFrame()
t1=pd.DataFrame(np.arange(12).reshape((4,3)), 
                columns=[['one','two','four'],['five','six','eight']])
t1['Month']='January'
t1.set_index(['Month'], inplace=True)

t2=pd.DataFrame(np.arange(12,24).reshape((4,3)),
                columns=[['one','two','four'],['five','six','eight']])
t2['Month']='February'
t2.set_index(['Month'], inplace=True)

t3=pd.concat([t0,t1])
t3=pd.concat([t3,t2])
t3

Solution

  • Do you have an old version of Pandas?

    Older defaulted .concat()'s sort arg to unset (None), which sorted based upon whether the data was already sorted, while the latest defaults to False

    Try explicitly setting sort=False and/or upgrading Pandas if possible!


    • Pandas 0.23.4 docs

      sort : boolean, default None

      Sort non-concatenation axis if it is not already aligned when join is ‘outer’. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.

      Explicitly pass sort=True to silence the warning and sort. Explicitly pass sort=False to silence the warning and not sort.

      This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

      New in version 0.23.0.

    • Pandas latest stable and v1.2.1 (time of writing)

      sort : bool, default False

      Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis.

      Changed in version 1.0.0: Changed to not sort by default.