python pandas sorting reindex columnheader

How to update de value of the column headers on a pandas DataFrame?

I'm having problems on sorting my header in a way that shows all the meetings that happened on that year, than move on to the next year (M being meeting):

Given the Data frame:

Meeting     M1/2023  M1/2024  M1/2025  M1/2026  M2/2023  M2/2024  M2/2025  M2/2026  M3/2023  M3/2024 (...)
Date                                                                        
2023-01-02    13.75  11.7500      NaN      NaN    13.75  11.2500      NaN   NaN    13.75     10.9375 (...)
2023-01-03    13.75  11.7500      NaN      NaN    13.75  11.2500      NaN   NaN    13.75     10.8750 (...)
2023-01-04    13.75  11.8125      NaN      NaN    13.75  11.4375      NaN   NaN    13.75     11.0000 (...)
2023-01-05    13.75  11.7500      NaN      NaN    13.75  11.3750      NaN   NaN    13.75     11.0000 (...)

What the order should be:
M1/2023  M2/2023  M3/2023  M4/2023  M1/2024  M2/2024  M3/2024   M4/2024  M1/2025  M2/2025 (...)
Date
(...)

At first i tried to reorder using df.sort_values(), but the result was the same. My next move was to get the headers as strings so i could edit the data and add the year in front of the Meeting number, so the sort_valeus() would work:

headers_dict = {}

for i in df.columns:
    headers_dict[i] = i.split('/')[1]+'_'+i.split('/')[0]

Output:

{'M1/2023': '2023_M1',
 'M1/2024': '2024_M1',
 'M1/2025': '2025_M1',
 'M1/2026': '2026_M1',
 'M2/2023': '2023_M2',
 'M2/2024': '2024_M2',
 'M2/2025': '2025_M2', (...)}

This part worked, but I could not put the edited strings back in to the df (I tried using .reindex() here) without turning all my data in to NaN. What should I do? Is there a way to do this? Sorry for my bad english.

Solution

You have two issues, you need to swap the order of the chunks and you need to use natural sorting.

I would use str.split then sort_values with natsort as key:

from natsort import natsort_key

idx = (df.columns.to_series().str.split('/', expand=True)
         .sort_values(by=[1, 0], key=natsort_key).index
      )

out = df[idx]

Output:

            M1/2023  M2/2023  M3/2023  M1/2024  M2/2024  M3/2024  M1/2025  M2/2025  M1/2026  M2/2026
Date                                                                                                
2023-01-02    13.75    13.75    13.75  11.7500  11.2500  10.9375      NaN      NaN      NaN      NaN
2023-01-03    13.75    13.75    13.75  11.7500  11.2500  10.8750      NaN      NaN      NaN      NaN
2023-01-04    13.75    13.75    13.75  11.8125  11.4375  11.0000      NaN      NaN      NaN      NaN
2023-01-05    13.75    13.75    13.75  11.7500  11.3750  11.0000      NaN      NaN      NaN      NaN

Assuming another example:

# input
  M1/2023 M1/2024 M2/2023 M2/2024 M10/2023 M10/2024
0     NaN     NaN     NaN     NaN      NaN      NaN

# output
  M1/2023 M2/2023 M10/2023 M1/2024 M2/2024 M10/2024
0     NaN     NaN      NaN     NaN     NaN      NaN