I have the following dataframe, which looks like the below:
df = pd.DataFrame({'fruit': ['berries','berries', 'berries', 'tropical',
'tropical','tropical','berries','nuts'],
'code': [100,100,100,200,200, 300,400,500],
'subcode': ['100A', '100B', '100C','200A', '200B','300A',
'400A', '500A']})
code fruit subcode
0 100 berries 100A
1 100 berries 100B
2 100 berries 100C
3 200 tropica 200A
4 200 tropical 200B
5 300 tropical 300A
6 400 berries 400A
7 500 nuts 500A
I want to transform the dataframe to this format:
code fruit subcode1 subcode1 subcode1
0 100 berries 100A 100B 100C
3 200 tropica 200A 200B
5 300 tropical 300A
6 400 berries 400A
7 500 nuts 500A
Unfortunately, I'm stuck as to how to proceed. I've consulted posts like, Unmelt Pandas DataFrame, and have combinations of stack and unstack. I suspect that some concatenation is involved, too. Would appreciate any advice to help point me in the right direction!
You can use groupby
, take the values and convert them to series.
df.groupby(['code','fruit'])['subcode'].apply(
lambda x: x.values
).apply(pd.Series)
.add_prefix('subcode_')
subcode_0 subcode_1 subcode_2
code fruit
100 berries 100A 100B 100C
200 tropical 200A 200B NaN
300 tropical 300A NaN NaN
400 berries 400A NaN NaN
500 nuts 500A NaN NaN