I have a table I'm trying to reshape in a particular way.
import pandas as pd
df = pd.DataFrame({'race': ['one', 'one', 'one', 'two', 'two', 'two'],
'type': ['D', 'K', 'G', 'D', 'D', 'K'],
'item': ['x', 'y', 'z', 'q', 'x', 'y'],
'level': [1, 2, 1, 6, 2, 3]})
df
The resulting dataframe:
race type item level
0 one D x 1
1 one K y 2
2 one G z 1
3 two D q 6
4 two D x 2
5 two K y 3
I would like to reshape it in this format:
D K G
item level item level item level
race
one x 1 y 2 z 1
two q 6 y 3 NaN NaN
two x 2 NaN NaN NaN NaN
item
is unique within race
but it can appear in multiple different races.race
column or index must expand to fit the number of items within the race. In the example above, there are two 'D' items in race 'two' so race 'two' repeats twice in 2 rows to accommodate both items. If there
were 5 'K' items in race 'two', race 'two' would need to repeat 5
times.How can I achieve my desired table shape?
I've already tried:
df.pivot(index='race', columns='type', values=['level', 'item'])
which gives error:
ValueError: Index contains duplicate entries, cannot reshape
Is there another way with pd.pivot
, pd.groupby
, pd.pivot_table
, or pd.crosstab
or another pandas or dataframe method that can work?
You must first deduplicate by race/type using groupby.cumcount
:
(df.assign(n=df.groupby(['race', 'type']).cumcount())
.pivot(index=('race', 'n'), columns='type', values=['level', 'item'])
.sort_index(level='type', axis=1).swaplevel(axis=1)
)
Output:
type D G K
item level item level item level
race n
one 0 x 1 z 1 y 2
two 0 q 6 NaN NaN y 3
1 x 2 NaN NaN NaN NaN