I have the following DataFrame:
import pandas as pd
data = {
"cat": ['A'] * 3 + ['B'] * 2,
"val1": [10, 9, 12, 20, 25],
"val2": [1, 3, 2, 6, 7],
"idx": [0, 1, 5, 1, 2]
}
df = pd.DataFrame(data)
df.set_index('idx', inplace=True)
which gives
cat val1 val2
idx
0 A 10 1
1 A 9 3
5 A 12 2
1 B 20 6
2 B 25 7
I want to transform it into a DataFrame with a column MultiIndex
A B
val1 val2 val1 val2
idx
0 10 1 NA NA
1 9 3 20 6
2 NA NA 25 7
5 12 2 NA NA
Is there a way to do this without manually splitting and horizontally concatenating the table?
You can try with df.pivot
, swaplevel
, and sort_index
. Add df.rename_axis
to get rid of the added column name (i.e. 'cat').
out = (
df.pivot(columns='cat')
.swaplevel(0, 1, axis=1)
.sort_index(axis=1, level=0)
.rename_axis(columns=(None, None))
)
out
A B
val1 val2 val1 val2
idx
0 10.0 1.0 NaN NaN
1 9.0 3.0 20.0 6.0
2 NaN NaN 25.0 7.0
5 12.0 2.0 NaN NaN
If idx
is not yet your index from the start, you can skip df.set_index('idx', inplace=True)
and simply use: df.pivot(index='idx', columns='cat')
.