Search code examples
pythonpandasdataframetransposemelt

Pandas melt multiple groups into single column


Original DataFrame:

+----+----------+----------+----------+----------+
| ID |  var1hrs |  var2hrs |  ind1var |  ind2var |
+----+----------+----------+----------+----------+
|  1 |       55 |       45 |      123 |      456 |
|  2 |       48 |       60 |      331 |      222 |
+----+----------+----------+----------+----------+

Target DataFrame:

+----+------------+------+------+
| ID |    type    |  hrs |  ind |
+----+------------+------+------+
|  1 |  primary   |   55 |  123 |
|  1 |  secondary |   45 |  456 |
|  2 |  primary   |   48 |  331 |
|  2 |  secondary |   60 |  222 |
+----+------------+------+------+

How would I go about melting multiple groups of variables into a single label column? The "1" in the variable names indicate type = "primary" and "2" indicates type = "secondary".


Solution

  • (Comments inlined)

    # set ID as the index and sort columns
    df = df.set_index('ID').sort_index(axis=1)
    
    # extract primary columns
    prim = df.filter(like='1')
    prim.columns = ['ind', 'vars']
    # extract secondary columns 
    sec = df.filter(like='2')
    sec.columns = ['ind', 'vars']
    
    # concatenation + housekeeping
    v = (pd.concat([prim, sec], keys=['primary', 'secondary'])
           .swaplevel(0, 1)
           .rename_axis(['ID', 'type'])
           .reset_index()
    )
    
    print(v)
       ID       type  ind  vars
    0   1    primary  123    55
    1   2    primary  331    48
    2   1  secondary  456    45
    3   2  secondary  222    60
    

    This is more or less one efficient way of doing it, even if the steps are a bit involved.