Search code examples
python-3.xpandaspivotpivot-tabledata-wrangling

How pivot/pivot table the pandas dataframe with multiple column values


I have the dataframe. `

data = pd.DataFrame([['Benz', 'MinSpeed', 0, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'TopSpeed', 200, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'ChasisNum', 654121, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'Seats', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'AirBags', 5, np.nan, 'USA', '2022-08-12'],
                     ['Benz', 'VehicleType', np.nan, 'Sedan', 'USA', '2022-08-12'],
                     ['Benz', 'Color', np.nan, 'Black','USA', '2022-08-12'],
                     ['Benz', 'InternetInside', np.nan, 'Yes','USA', '2022-08-12'],
                     
                     ['Ferrari', 'MinSpeed', 0, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'TopSpeed', 250, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'ChasisNum', 781121, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'Seats', 4, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'AirBags', 2, np.nan, 'France', '2022-12-25'],
                     ['Ferrari', 'VehicleType', np.nan, 'SUV', 'France', '2022-12-25'],
                     ['Ferrari', 'Color', np.nan, 'Red','France', '2022-12-25'],
                     ['Ferrari', 'InternetInside', np.nan, 'No','France', '2022-12-25'],
                     ], 
                    columns= ['CarModel', 'Features', 'NumericalValues', 'CategoricalValues','Country', 'DeliveryDate'])

`

I am trying the pivot the data using the pivot function but getting repeated columns for "NumericalValues" and "CategoricalValues" values

Code: `

data.pivot(index='CarModel', columns='Features', values=['NumericalValues','CategoricalValues' ]).reset_index()

`

I need the expected output as: `

output_data = pd.DataFrame([['Benz', 0, 200, 654121, 5, 5, 'Sedan', 'Black', 'Yes', 'USA', '2022-08-12'],
                         ['Ferrari', 0, 250, 781121, 4, 2, 'SUV', 'Red', 'No', 'France', '2022-12-25']
                     ],
                    columns=['CarModel', 'MinSpeed', 'TopSpeed', 'ChasisNum','Seats', 'AirBags', 'VehicleType', 'Color', 'InternetInside', 'Country', 'DeliveryDate'])

` I tried with Pivot table as well but unable to get this output.


Solution

  • From your data, as you have a value either in 'NumericalValues or 'CategoricalValues', so you can create a column that combine all the information from both column with fillna, then use the pivot as you did.

    res = (
        data.assign(Values=lambda x: x['NumericalValues'].fillna(x['CategoricalValues']))
          .pivot(index='CarModel', columns='Features', values='Values')
          .reset_index().rename_axis(columns=None)
    )
    print(res)
    #   CarModel AirBags ChasisNum  Color InternetInside MinSpeed Seats TopSpeed  \
    # 0     Benz     5.0  654121.0  Black            Yes      0.0   5.0    200.0   
    # 1  Ferrari     2.0  781121.0    Red             No      0.0   4.0    250.0   
    
    #   VehicleType  
    # 0       Sedan  
    # 1         SUV