I would like to transform a dataframe of the form:
| index | 0 | 1 | 2 | 3 | 4 | ... |
|-------|-----------|------|------|------|------|-----|
| 0 | parameter | 2001 | 2001 | 2002 | 2002 | ... |
| 1 | | low | high | low | high | ... |
| 2 | foo | 1 | 2 | 7 | 8 | ... |
| 3 | bar | NaN | NaN | 12 | 13 | ... |
into a dataframe of the form:
| index | parameter | year | low | high | ... |
|-------|-----------|------|-----|------|-----|
| 0 | foo | 2001 | 1 | 2 | ... |
| 1 | foo | 2002 | 7 | 8 | ... |
| 2 | bar | 2002 | 12 | 13 | ... |
Unfortunately, I am stuck on the transformation. So far, I have tried to work only on the year columns to add a multi-index for easier pivoting (which I think might be required?):
import pandas as pd
import numpy as np
data = {
0: ['parameter', '', 'foo', 'bar'],
1: [2001, 'low', 1, np.nan],
2: [2001, 'high', 2, np.nan],
3: [2002, 'low', 7, 12],
4: [2002, 'high', 8, 13],
}
df = pd.DataFrame(data)
dfyears = df[[col for col in df.columns.unique() if isinstance(df.iloc[0][col], int)]]
multiindex = pd.MultiIndex.from_product(
[
list(dfyears.iloc[0].unique()),
list(dfyears.iloc[1].unique())
]
)
dfyears.columns = multiindex
dfyears = dfyears.drop([0, 1])
You could use pandas.MultiIndex.from_arrays
:
idx = pd.MultiIndex.from_arrays([df.iloc[0, 1:], df.iloc[1, 1:]],
names=('year', None))
out = (df.iloc[2:].set_index(0).set_axis(idx, axis=1)
.rename_axis('parameter').stack(0).reset_index()
)
Output:
parameter year low high
0 foo 2001 1 2
1 foo 2002 7 8
2 bar 2002 12 13