Search code examples
pythonpandasdataframepivot-tableunpivot

Pandas: How can I move certain columns into rows?


Suppose I have the df below. I would like to combine the price columns and value columns so that all prices are in one column and all volumes are in another column. I would also like a third column that identified the price level. For example, unit1, unit2 and unit3.

import numpy as np
import pandas as pd
df = pd.DataFrame(
    {
        'uid': ['U100', 'U200', 'E100', 'E200', 'E300', 'A100', 'A200', 'A300', 'A400', 'A500'],
        'location': ['US', 'US', 'EU', 'EU', 'EU', 'Asia', 'Asia', 'Asia', 'Asia', 'Asia'],
        'unit1_price': [10, 20, 15, 10, 10, 10, 20, 20, 25, 25],
        'unit1_vol': [100, 150, 100, 200, 150, 150, 100, 200, 200, 200],
        'unit2_price': [10, 25, 30, 20, 10, 10, 10, 10, 20, 20],
        'unit2_vol': [200, 200, 150, 300, 300, 200, 150, 225, 225, 250],
        'unit3_price': [0, 0, 0, 20, 20, 20, 20, 20, 20, 20],
        'unit3_vol': [0, 0, 0, 500, 500, 500, 500, 500, 500, 500]
    }
)
df

df1

This is what the final df should look like:

df_final


I tried using melt and I think almost have the right answer.

pd.melt(df, id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit1_vol', 'unit2_price', 'unit2_vol', 'unit3_price', 'unit3_vol'])

This is what the partial df looks like with melt:

df2

The problem with the above is that volume and price are in the same column but I want them to be in 2 separate columns.

Did I use the right function?


Solution

  • You can form two dataframe using pd.melt first and combine it back to become one dataframe.

    df1 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_price', 'unit2_price', 'unit3_price'],var_name='unit',value_name='price')
    
    df2 = df.melt(id_vars=['uid', 'location'], value_vars=['unit1_vol', 'unit2_vol', 'unit3_vol'],var_name='unit', value_name="volume")
    
    ddf = pd.concat([df1,df2['volume']],axis=1).sort_values(by=['uid','unit'],ignore_index=True)
    
    ddf['unit']=ddf['unit'].str.split('_',expand=True)[0]