Search code examples
pythonpandasdataframeone-hot-encoding

Combine 2 dataframe and then separate them


I have 2 dataframes with same column headers. I wish to perform hot encoding on both of them. I cannot perform them one by one. I wish to append two dataframe together and then perform hot encoding and then split them into 2 dataframes with headers on each of them again.

Code below perform hot encoding one by one instead of merging them and then hot encode.

train = pd.get_dummies(train, columns= ['is_discount', 'gender', 'city'])
test = pd.get_dummies(test, columns= ['is_discount', 'gender', 'city'])

Solution

  • Use concat with keys then divide i.e

    #Example Dataframes 
    train = pd.DataFrame({'x':[1,2,3,4]})
    test = pd.DataFrame({'x':[4,2,5,0]})
    
    # Concat with keys
    temp = pd.get_dummies(pd.concat([train,test],keys=[0,1]), columns=['x'])
    
    # Selecting data from multi index 
    train,test = temp.xs(0),temp.xs(1)
    

    Output :

    #Train 
      x_0  x_1  x_2  x_3  x_4  x_5
    0    0    1    0    0    0    0
    1    0    0    1    0    0    0
    2    0    0    0    1    0    0
    3    0    0    0    0    1    0
    
    #Test
       x_0  x_1  x_2  x_3  x_4  x_5
    0    0    0    0    0    1    0
    1    0    0    1    0    0    0
    2    0    0    0    0    0    1
    3    1    0    0    0    0    0