I have dataframe like this:
Product | Colour | Grade | Price | Size |
---|---|---|---|---|
Apple | Yellow | A | 1000 | 3 |
Apple | Green | B | 500 | 4 |
Banana | Yellow | B | 2000 | 5 |
Banana | Green | A | 3000 | 6 |
I want to convert the dataframe to:
Product | Price_Yellow_A | Size_Yellow_A | Price_Yellow_B | Size_Yellow_B | Price_Green_A | Size_Green_A | Price_Green_B | Size_Green_B |
---|---|---|---|---|---|---|---|---|
Apple | 1000 | 3 | 0 | 0 | 0 | 0 | 500 | 4 |
Banana | 0 | 0 | 2000 | 5 | 3000 | 6 | 0 | 0 |
How to do this?
You can use pd.pivot()
to achieve this, with some other intermediary steps.
First create a column of the combined Colour
and Grade
like this:
df['Combined'] = df['Colour'] + '_' + df['Grade']
0 Yellow_A
1 Green_B
2 Yellow_B
3 Green_A
Name: Combined, dtype: object
Then pivot()
the dataframe, keeping Product
as your index, using the new Combined
column as columns and the Price
and Size
columns as values. I added .fillna(0)
to put 0s in the blanks. Like this:
new_df = df.pivot(index='Product', columns=['Combined'], values=['Price', 'Size']).fillna(0)
Price Size
Combined Green_A Green_B Yellow_A Yellow_B Green_A Green_B Yellow_A Yellow_B
Product
Apple 0.0 500.0 1000.0 0.0 0.0 4.0 3.0 0.0
Banana 3000.0 0.0 0.0 2000.0 6.0 0.0 0.0 5.0
This returns the desired table, but with mutli-level columns, so we can use a bit of list comprehension to rename the columns to the desired column names, like this:
new_df.columns = [col[0]+'_'+col[1] for col in new_df.columns]
Price_Green_A Price_Green_B Price_Yellow_A Price_Yellow_B Size_Green_A Size_Green_B Size_Yellow_A Size_Yellow_B
Product
Apple 0.0 500.0 1000.0 0.0 0.0 4.0 3.0 0.0
Banana 3000.0 0.0 0.0 2000.0 6.0 0.0 0.0 5.0
Full code:
df['Combined'] = df['Colour'] + '_' + df['Grade']
new_df = df.pivot(index='Product', columns=['Combined'], values=['Price', 'Size']).fillna(0)
new_df.columns = [col[0]+'_'+col[1] for col in new_df.columns]