Search code examples
pythonpandassortingreindexcolumnheader

How to update de value of the column headers on a pandas DataFrame?


I'm having problems on sorting my header in a way that shows all the meetings that happened on that year, than move on to the next year (M being meeting):

Given the Data frame:

Meeting     M1/2023  M1/2024  M1/2025  M1/2026  M2/2023  M2/2024  M2/2025  M2/2026  M3/2023  M3/2024 (...)
Date                                                                        
2023-01-02    13.75  11.7500      NaN      NaN    13.75  11.2500      NaN   NaN    13.75     10.9375 (...)
2023-01-03    13.75  11.7500      NaN      NaN    13.75  11.2500      NaN   NaN    13.75     10.8750 (...)
2023-01-04    13.75  11.8125      NaN      NaN    13.75  11.4375      NaN   NaN    13.75     11.0000 (...)
2023-01-05    13.75  11.7500      NaN      NaN    13.75  11.3750      NaN   NaN    13.75     11.0000 (...)

What the order should be:
M1/2023  M2/2023  M3/2023  M4/2023  M1/2024  M2/2024  M3/2024   M4/2024  M1/2025  M2/2025 (...)
Date
(...)

At first i tried to reorder using df.sort_values(), but the result was the same. My next move was to get the headers as strings so i could edit the data and add the year in front of the Meeting number, so the sort_valeus() would work:

headers_dict = {}

for i in df.columns:
    headers_dict[i] = i.split('/')[1]+'_'+i.split('/')[0]

Output:

{'M1/2023': '2023_M1',
 'M1/2024': '2024_M1',
 'M1/2025': '2025_M1',
 'M1/2026': '2026_M1',
 'M2/2023': '2023_M2',
 'M2/2024': '2024_M2',
 'M2/2025': '2025_M2', (...)}

This part worked, but I could not put the edited strings back in to the df (I tried using .reindex() here) without turning all my data in to NaN. What should I do? Is there a way to do this? Sorry for my bad english.


Solution

  • You have two issues, you need to swap the order of the chunks and you need to use natural sorting.

    I would use str.split then sort_values with natsort as key:

    from natsort import natsort_key
    
    idx = (df.columns.to_series().str.split('/', expand=True)
             .sort_values(by=[1, 0], key=natsort_key).index
          )
    
    out = df[idx]
    

    Output:

                M1/2023  M2/2023  M3/2023  M1/2024  M2/2024  M3/2024  M1/2025  M2/2025  M1/2026  M2/2026
    Date                                                                                                
    2023-01-02    13.75    13.75    13.75  11.7500  11.2500  10.9375      NaN      NaN      NaN      NaN
    2023-01-03    13.75    13.75    13.75  11.7500  11.2500  10.8750      NaN      NaN      NaN      NaN
    2023-01-04    13.75    13.75    13.75  11.8125  11.4375  11.0000      NaN      NaN      NaN      NaN
    2023-01-05    13.75    13.75    13.75  11.7500  11.3750  11.0000      NaN      NaN      NaN      NaN
    

    Assuming another example:

    # input
      M1/2023 M1/2024 M2/2023 M2/2024 M10/2023 M10/2024
    0     NaN     NaN     NaN     NaN      NaN      NaN
    
    # output
      M1/2023 M2/2023 M10/2023 M1/2024 M2/2024 M10/2024
    0     NaN     NaN      NaN     NaN     NaN      NaN