Converting multi column header df from wide to long format in python

I would like to convert an excel file from wide format into long format.

I'm reading an excel file which has not only two rows of headers, it also includes merged cells in the header. Input Example:

| Task | Name | May,2022  | Jun,2022  | Jul,2022  |
|      |      |status|type|status|type|status|type|
| ---- | ---- |------|----|------|----|------|----|
| 1    | Tom  |1     |AB  |1     |DT  |0     |AB  | 
| 2    | John |0     |DT  |1     |CC  |0     |GH  |

Desired Output:

Task	Name	Date	status	type
1	Tom	May,2022	1	AB
1	Tom	Jun,2022	1	DT
1	Tom	Jul,2022	0	AB
2	John	May,2022	0	DT
2	John	Jun,2022	1	CC
2	John	Jul,2022	0	GH

What i have:

import pandas as pd
df = pd.read_excel(path)
df = df.melt(id_vars=["Task","Name"],
     var_name="Date",
     value_name="Value")

I don't know how to deal with the multi header. I tried combining the two headers.

import pandas as pd
df = pd.read_excel(path, header[1,2])
df.columns = df.columns.to_series().ffill().values
df.columns.value[0]='Task'
df.columns.value[1]='Name'
df = df.melt(id_vars=["Task","Name"],
     var_name="Date",
     value_name="Value")

This results in having something like: (I'm fine with the date transformation)

Task	Name	(2022-05-01 00:00:00, status)	(2022-05-01 00:00:00, type)
1	Tom	1	AB
2	John	0	DT

The melt function returns a KeyError 'Task'. Even if this would work i would somehow need to split the date from the status and type into seperate columns. Someone knows how to deal with such data?

Solution

You can use set_index to preserve Task and Name columns then stack Date column and finally reset_index to get expected output:

>>> (df.set_index(['Task', 'Name']).rename_axis(columns=['Date', None])
       .stack(level=0).reset_index())

   Task  Name      Date  status type
0     1   Tom  Jul,2022       0   AB
1     1   Tom  Jun,2022       1   DT
2     1   Tom  May,2022       1   AB
3     2  John  Jul,2022       0   GH
4     2  John  Jun,2022       1   CC
5     2  John  May,2022       0   DT

Minimal Reproducible Example:

data = {'index': [0, 1],
        'columns': [('Task', ''),
         ('Name', ''),
         ('May,2022', 'status'),
         ('May,2022', 'type'),
         ('Jun,2022', 'status'),
         ('Jun,2022', 'type'),
         ('Jul,2022', 'status'),
         ('Jul,2022', 'type')],
        'data': [[1, 'Tom', 1, 'AB', 1, 'DT', 0, 'AB'],
                 [2, 'John', 0, 'DT', 1, 'CC', 0, 'GH']],
        'index_names': [None],
        'column_names': [None, None]}
df = pd.DataFrame.from_dict(data, orient='tight')