Search code examples
pythonpandasdataframepivot-table

Pivot/Transform Pandas DataFrame (with MultiIndex)


I would like to transform a dataframe of the form:

| index | 0         | 1    | 2    | 3    | 4    | ... |
|-------|-----------|------|------|------|------|-----|
| 0     | parameter | 2001 | 2001 | 2002 | 2002 | ... |
| 1     |           | low  | high | low  | high | ... |
| 2     | foo       | 1    | 2    | 7    | 8    | ... |
| 3     | bar       | NaN  | NaN  | 12   | 13   | ... |

into a dataframe of the form:

| index | parameter | year | low | high | ... |
|-------|-----------|------|-----|------|-----|
| 0     | foo       | 2001 | 1   | 2    | ... |
| 1     | foo       | 2002 | 7   | 8    | ... |
| 2     | bar       | 2002 | 12  | 13   | ... |

Unfortunately, I am stuck on the transformation. So far, I have tried to work only on the year columns to add a multi-index for easier pivoting (which I think might be required?):

import pandas as pd
import numpy as np

data = {
    0: ['parameter', '', 'foo', 'bar'],
    1: [2001, 'low', 1, np.nan],
    2: [2001, 'high', 2, np.nan],
    3: [2002, 'low', 7, 12],
    4: [2002, 'high', 8, 13],
}

df = pd.DataFrame(data)

dfyears = df[[col for col in df.columns.unique() if isinstance(df.iloc[0][col], int)]]

multiindex = pd.MultiIndex.from_product(
    [
        list(dfyears.iloc[0].unique()),
        list(dfyears.iloc[1].unique())
    ]
)

dfyears.columns = multiindex
dfyears = dfyears.drop([0, 1])

Solution

  • You could use pandas.MultiIndex.from_arrays:

    idx = pd.MultiIndex.from_arrays([df.iloc[0, 1:], df.iloc[1, 1:]],
                                    names=('year', None))
    
    out = (df.iloc[2:].set_index(0).set_axis(idx, axis=1)
             .rename_axis('parameter').stack(0).reset_index()
          )
    

    Output:

      parameter  year low high
    0       foo  2001   1    2
    1       foo  2002   7    8
    2       bar  2002  12   13