Search code examples
pythonpandasone-hot-encoding

Python pandas: dynamic concatenation from get_dummies


having the following dataframe:

import pandas as pd

cars = ["BMV", "Mercedes", "Audi"]
customer = ["Juan", "Pepe", "Luis"]
price = [100, 200, 300]
year = [2022, 2021, 2020]


df_raw = pd.DataFrame(list(zip(cars, customer, price, year)),\
                      columns=["cars", "customer", "price", 'year'])

I need to do one-hot encoding from the categorical variables cars and customer, for this I use the get_dummies method for these two columns.

numerical = ["price", "year"]
df_final = pd.concat([df_raw[numerical], pd.get_dummies(df_raw.cars),\
                      pd.get_dummies(df_raw.customer)], axis=1)

Is there a way to generate these dummies in a dynamic way, like putting them in a list and loop through them with a for.In this case it may seem simple because I only have 2, but if I had 30 or 60 attributes, would I have to go one by one?


Solution

  • pd.get_dummies

    pd.get_dummies(df_raw, columns=['cars', 'customer'])
    

       price  year  cars_Audi  cars_BMV  cars_Mercedes  customer_Juan  customer_Luis  customer_Pepe
    0    100  2022          0         1              0              1              0              0
    1    200  2021          0         0              1              0              0              1
    2    300  2020          1         0              0              0              1              0