I have NOT a problem but a behaviour I would like to understand.
I have an Excel with 12 sheets, I want them all combined in 1 dataframe, with the sheetname as extra column. I can do it but have 1 strange behaviour I can't explain : the final order of columns of the dataframe is changed into and alphabetical order.
This is my flow :
I read through 10 sheets of an excel.
Every sheet is read in a dataframe,
all sheets have the same shape (73 rows x 50 columns) and the same hierarchical column. (2 levels).
When a sheet is read in, I append it to a dataframe, then I read the next sheet etc..
The result is a dataframe of 876rows and 50 columns. That is good. But - as mentioned - with a different column order than the original set.
WORKAROUND : I save the order in a list and reapply it at the final dataframe, but that doesn't explain the behaviour.
If I try to replay this with a sample code (see underneath) but I cannot replay the behaviour, my sample code works as expected and preserves the order. Any idea how I can start to investigate this behaviour?
import pandas as pd
import numpy as np
t0=pd.DataFrame()
t1=pd.DataFrame(np.arange(12).reshape((4,3)),
columns=[['one','two','four'],['five','six','eight']])
t1['Month']='January'
t1.set_index(['Month'], inplace=True)
t2=pd.DataFrame(np.arange(12,24).reshape((4,3)),
columns=[['one','two','four'],['five','six','eight']])
t2['Month']='February'
t2.set_index(['Month'], inplace=True)
t3=pd.concat([t0,t1])
t3=pd.concat([t3,t2])
t3
Do you have an old version of Pandas?
Older defaulted .concat()
's sort
arg to unset (None
), which sorted based upon whether the data was already sorted, while the latest defaults to False
Try explicitly setting sort=False
and/or upgrading Pandas if possible!
sort : boolean, default None
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. The current default of sorting is deprecated and will change to not-sorting in a future version of pandas.
Explicitly pass
sort=True
to silence the warning and sort. Explicitly passsort=False
to silence the warning and not sort.This has no effect when
join='inner'
, which already preserves the order of the non-concatenation axis.New in version 0.23.0.
Pandas latest stable and v1.2.1 (time of writing)
sort : bool, default False
Sort non-concatenation axis if it is not already aligned when join is ‘outer’. This has no effect when join='inner', which already preserves the order of the non-concatenation axis.
Changed in version 1.0.0: Changed to not sort by default.