Search code examples
pythonpython-3.xpandasdataframe

Python Pandas Dataframe: Make Rows from Multiple Column Into Multiple Column


I have dataframe like this:

Product Colour Grade Price Size
Apple Yellow A 1000 3
Apple Green B 500 4
Banana Yellow B 2000 5
Banana Green A 3000 6

I want to convert the dataframe to:

Product Price_Yellow_A Size_Yellow_A Price_Yellow_B Size_Yellow_B Price_Green_A Size_Green_A Price_Green_B Size_Green_B
Apple 1000 3 0 0 0 0 500 4
Banana 0 0 2000 5 3000 6 0 0

How to do this?


Solution

  • You can use pd.pivot() to achieve this, with some other intermediary steps.

    First create a column of the combined Colour and Grade like this:

    df['Combined'] = df['Colour'] + '_' + df['Grade']
    
    0    Yellow_A
    1     Green_B
    2    Yellow_B
    3     Green_A
    Name: Combined, dtype: object
    

    Then pivot() the dataframe, keeping Product as your index, using the new Combined column as columns and the Price and Size columns as values. I added .fillna(0) to put 0s in the blanks. Like this:

    new_df = df.pivot(index='Product', columns=['Combined'], values=['Price', 'Size']).fillna(0)
    
               Price                              Size                          
    Combined Green_A Green_B Yellow_A Yellow_B Green_A Green_B Yellow_A Yellow_B
    Product                                                                     
    Apple        0.0   500.0   1000.0      0.0     0.0     4.0      3.0      0.0
    Banana    3000.0     0.0      0.0   2000.0     6.0     0.0      0.0      5.0
    

    This returns the desired table, but with mutli-level columns, so we can use a bit of list comprehension to rename the columns to the desired column names, like this:

    new_df.columns = [col[0]+'_'+col[1] for col in new_df.columns]
    
             Price_Green_A  Price_Green_B  Price_Yellow_A  Price_Yellow_B  Size_Green_A  Size_Green_B  Size_Yellow_A  Size_Yellow_B
    Product                                                                                                                        
    Apple              0.0          500.0          1000.0             0.0           0.0           4.0            3.0            0.0
    Banana          3000.0            0.0             0.0          2000.0           6.0           0.0            0.0            5.0
    

    Full code:

    df['Combined'] = df['Colour'] + '_' + df['Grade']
    new_df = df.pivot(index='Product', columns=['Combined'], values=['Price', 'Size']).fillna(0)
    new_df.columns = [col[0]+'_'+col[1] for col in new_df.columns]