Search code examples
pandasmulti-index

Regroup columns in a df hierarchically?


I have a set of data with multiple experiences. Two for egg, two for spam :

>>> df = pd.read_csv(StringIO("""
name,egg1,egg2,spam1,spam2
foo,4,8,15,16
bar,23,42,66,83
"""))

>>> df.set_index('name')

      egg1  egg2  spam1  spam2
name
foo      4     8     15     16
bar     23    42     66     83

What I would like to get is something like this :

           egg          spam
         1     2      1      2
name
foo      4     8     15     16
bar     23    42     66     83

I read about Multiindex and indexslice, but I haven't found how to use them properly.

Any advice?


Solution

  • You can split the columns, which convert to a MultiIndex, and drop the empty level:

    (df.set_index('name')
       .pipe(lambda df: df.set_axis(df.columns
                                     .str
                                     .split(r"(\d)", expand = True)
                                     .droplevel(-1), 
                                   axis = 1, 
                                   )
            )
      )
    
         egg     spam    
           1   2    1   2
    name                 
    foo    4   8   15  16
    bar   23  42   66  83
    

    Of course, a cleaner way would be to do it in steps:

    Set name as index:

    df = df.set_index('name')
    

    Create MultiIndex via str.split, setting expand = True:

     df.columns = df.columns.str.split(r"(\d)", expand = True).droplevel(-1)
    
     df
    
         egg     spam    
           1   2    1   2
    name                 
    foo    4   8   15  16
    bar   23  42   66  83