Search code examples
pythonpandasdataframeindexingmulti-index

Create a multiindex DataFrame from existing delimited column names


I have a pandas DataFrame that looks like the following

           A_value A_avg B_value B_avg
date
2020-01-01       1     2      3      4
2020-02-01       5     6      7      8

and my goal is to create a multiindex Dataframe that looks like that:

                 A            B
                 value avg    value  avg
date
2020-01-01       1     2      3      4
2020-02-01       5     6      7      8

So the part of the column name before the '-' should be the first level of the column index and the part afterwards the second level. The first part is unstructured, the second is always the same (4 endings).

I tried to solve it with pd.wide_to_long() but I think that is the wrong path, as I don't want to change the df itself. The real df is much larger, so creating it manually is not an option. I'm stuck here and did not find a solution.


Solution

  • You can split the columns by the delimier and expand to create Multiindex:

    df.columns=df.columns.str.split("_",expand=True)