python-3.x pandas sorting multi-index lexicographic

MultiIndex DataFrame sorting order after columns stacking to Series

Problems during sort_index after DataFrame stack application:

I have an MS Excel singlesheet file which looks like: link to table screenshot

I'v read it in DataFrame and received an expected result:

df_test = pd.read_excel(io = path_test_xlsx, header = 0, index_col = 0) 
       US  AR   QA
Date                   
2015-01-01   1  10  100
2016-01-01   2  20  200
2017-01-01   3  30  300

Then I stacked it to Series like this:

ser_test = df_test.stack()
ser_test.index.names = ['Date', 'Country']
Date        Country
2015-01-01  US           1
            AR          10
            QA         100
2016-01-01  US           2
            AR          20
            QA         200
2017-01-01  US           3
            AR          30
            QA         300

Next step is to sort Series by Country and then by Date. So I tried this:

print(ser_test.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300

Seems to me the problem is in stacking procedure, because next list of manipulations drove me to success:

df_test_reseted = ser_test.reset_index(level = 'Country')
ser_test_reseted = df_test_reseted.set_index('Country', append = True).squeeze()
print(ser_test_reseted.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3

Is stack procedure really leads to ignoring lexicographical order or I've done something wrong?

Solution

It seems bug because sorting is by labels, not by levels. If recreate MultiIndex it working nice:

print (ser_test.index.labels)
[[0, 0, 0, 1, 1, 1, 2, 2, 2], [0, 1, 2, 0, 1, 2, 0, 1, 2]]

print (ser_test.index.levels)
[['2015-01-01', '2016-01-01', '2017-01-01'], ['US', 'AR', 'QA']]

ser_test.index = pd.MultiIndex.from_tuples(ser_test.index.tolist(), names=['Date', 'Country'])
print(ser_test.sort_index(level = ['Country', 'Date']))
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
dtype: int64

Another idea is use this trick - convert Series to one column DataFrame and select by 0 in last step:

print(ser_test.to_frame().sort_index(level=['Country', 'Date'])[0])
Date        Country
2015-01-01  AR          10
2016-01-01  AR          20
2017-01-01  AR          30
2015-01-01  QA         100
2016-01-01  QA         200
2017-01-01  QA         300
2015-01-01  US           1
2016-01-01  US           2
2017-01-01  US           3
Name: 0, dtype: int64