Search code examples
pythonpandasmulti-indexmelt

Melt a bunch of multiIndexed columns while keeping a single 'index' column


I have a big table with rows representing observations. A subset of my columns can be grouped by two meta-categories so I have represented these using a multiindex. However, The multiindex is not applicable to all columns. So all the remaining columns have index labels only in the first level.

I want to apply melt() to some of these multidimensional columns and then merge() them back with my main table later by an index column.

Here is a MRE of the part I'm stuck on. I have a data frame like this:

df = pd.DataFrame({
  ('INDEX',): [1,2,3],
  ('a','x'): ['ww','rt','pb'],
  ('a','y'): [88,97,12],
  ('b','x'): ['ew','tr','cv'],
  ('b','y'): [14,42,67],
  ('c','x'): ['wq','fg','dg'],
  ('c','y'): [65,78,46]})

df
  INDEX   a       b       c
    NaN   x   y   x   y   x   y
0     1  ww  88  ew  14  wq  65
1     2  rt  97  tr  42  fg  78
2     3  pb  12  cv  67  dg  46

Now I'd like for this:

df.melt(id_vars=('INDEX',))

...to result in this:

INDEX thing_1 thing_2 value
    1       a       x    ww
    2       a       x    rt
    3       a       x    pb
    1       a       y    88
    2       a       y    97
    3       a       y    12
    1       b       x    ew
    2       b       x    tr
    3       b       x    cv
    1       b       y    14
    2       b       y    42
    3       b       y    67
    1       c       x    wq
    2       c       x    fg
    3       c       x    dg
    1       c       y    65
    2       c       y    78
    3       c       y    46

But I get a long traceback ending in Exception: Data must be 1-dimensional.

When I first started attempting this I was trying to supply value_vars but was having no luck; I later realized that del df['INDEX']; df.melt() works with no trouble except of course that I wanted to keep my INDEX column.

I have also tried some variants such as putting 'INDEX' in a 1-tuple, a 2-tuple ending in np.nan and slice(None), a list, etc. I get the same complaint about 1-dimensional data, or for some cases, ValueError: id_vars must be a list of tuples when columns are a MultiIndex.

What's the right way to go about this? I found a solution after a bunch of trial and error which I will post below, but it seems like it ought to have been simpler or more elegant. Thanks.


Solution

  • You may use empty string as 2nd level of INDEX

    df = pd.DataFrame({
      ('INDEX',''): [1,2,3],
      ('a','x'): ['ww','rt','pb'],
      ('a','y'): [88,97,12],
      ('b','x'): ['ew','tr','cv'],
      ('b','y'): [14,42,67],
      ('c','x'): ['wq','fg','dg'],
      ('c','y'): [65,78,46]})
    
    In [198]: df
    Out[198]:
      INDEX   a       b       c
              x   y   x   y   x   y
    0     1  ww  88  ew  14  wq  65
    1     2  rt  97  tr  42  fg  78
    2     3  pb  12  cv  67  dg  46
    
    df.melt(id_vars='INDEX')
    
    Out[200]:
        INDEX variable_0 variable_1 value
    0       1          a          x    ww
    1       2          a          x    rt
    2       3          a          x    pb
    3       1          a          y    88
    4       2          a          y    97
    5       3          a          y    12
    6       1          b          x    ew
    7       2          b          x    tr
    8       3          b          x    cv
    9       1          b          y    14
    10      2          b          y    42
    11      3          b          y    67
    12      1          c          x    wq
    13      2          c          x    fg
    14      3          c          x    dg
    15      1          c          y    65
    16      2          c          y    78
    17      3          c          y    46