Search code examples
pythonpandasmelt

how to transform dataframe so that column values are row values


I have the following dataframe, which looks like the below:

df = pd.DataFrame({'fruit': ['berries','berries', 'berries', 'tropical', 
'tropical','tropical','berries','nuts'], 
           'code': [100,100,100,200,200, 300,400,500],
           'subcode': ['100A', '100B', '100C','200A', '200B','300A', 
           '400A', '500A']})


    code    fruit   subcode
  0 100     berries 100A
  1 100     berries 100B
  2 100     berries 100C
  3 200     tropica 200A
  4 200     tropical 200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A

I want to transform the dataframe to this format:

    code    fruit   subcode1 subcode1 subcode1
  0 100     berries 100A      100B   100C
  3 200     tropica 200A      200B
  5 300     tropical 300A
  6 400     berries 400A
  7 500     nuts    500A 

Unfortunately, I'm stuck as to how to proceed. I've consulted posts like, Unmelt Pandas DataFrame, and have combinations of stack and unstack. I suspect that some concatenation is involved, too. Would appreciate any advice to help point me in the right direction!


Solution

  • You can use groupby, take the values and convert them to series.

    df.groupby(['code','fruit'])['subcode'].apply(
             lambda x: x.values
          ).apply(pd.Series)
           .add_prefix('subcode_')
    
                    subcode_0 subcode_1 subcode_2
    code fruit                                 
    100  berries       100A      100B      100C
    200  tropical      200A      200B       NaN
    300  tropical      300A       NaN       NaN
    400  berries       400A       NaN       NaN
    500  nuts          500A       NaN       NaN