Search code examples
pythonpandasone-hot-encoding

How to print horizontally, pandas (One-Hot-Encoding)


I would like to change the format of my one hot encoding pandas script. I want to change it from printing the output vertically with three indexes, to printing it horizontally with one index. The code and output are both down below. And if it is at all possible I would like spaces between the columns to separate them.

Code:

from random import randint
import pandas_datareader.data as web
import pandas as pd
import datetime 
import itertools as it
import numpy as np
import csv

df = pd.read_csv('Filename')
df.columns = ['Date','b1','b2','b3']
df = df.set_index('Date')

reversed_df = df.iloc[::-1]

BallOne = pd.get_dummies(reversed_df.b1[:5])
BallTwo = pd.get_dummies(reversed_df.b2[:5])
BallThree = pd.get_dummies(reversed_df.b3[:5])
print(BallOne,("\n"))
print(BallTwo,("\n"))
print(BallThree,("\n"))

Output:
            2  5  6  8
Date                  
1996-12-16  0  0  1  0
1996-12-17  0  0  0  1
1996-12-18  0  1  0  0
1996-12-19  1  0  0  0
1996-12-20  0  0  1  0 

            3  5  8  9
Date                  
1996-12-16  0  1  0  0
1996-12-17  0  0  0  1
1996-12-18  0  1  0  0
1996-12-19  1  0  0  0
1996-12-20  0  0  1  0 

            1  5  7  9
Date                  
1996-12-16  0  0  0  1
1996-12-17  1  0  0  0
1996-12-18  0  0  1  0
1996-12-19  0  1  0  0
1996-12-20  0  0  0  1

Change the output to this:

            2  5  6  8        3  5  8  9        1  5  7  9 
Date                  
1996-12-16  0  0  1  0        0  1  0  0        0  0  0  1 
1996-12-17  0  0  0  1        0  0  0  1        1  0  0  0 
1996-12-18  0  1  0  0        0  1  0  0        0  0  1  0 
1996-12-19  1  0  0  0        1  0  0  0        0  1  0  0 
1996-12-20  0  0  1  0        0  0  1  0        0  0  0  1

Solution

  • You can used pandas.concat() here.

    import pandas as pd
    
    df_1 = pd.DataFrame({1: [0, 1, 0, 1, 0], 7: [0, 1, 0, 0 , 0]}, index = pd.date_range('2019-01-01', '2019-01-05'))
    
    df_2 = pd.DataFrame({2: [0, 1, 1, 1, 0], 7: [0, 1, 1, 1 , 1]}, index = pd.date_range('2019-01-01', '2019-01-05'))
    
    print(pd.concat([df_1, df_2], axis = 1))
    

    Gives:

                1  7  2  7
    2019-01-01  0  0  0  0
    2019-01-02  1  1  1  1
    2019-01-03  0  0  1  1
    2019-01-04  1  0  1  1
    2019-01-05  0  0  0  1
    

    With the data you provided, there are some duplicate column labels. One way to resolve this is using keys.

    print(pd.concat([df_1, df_2], keys = ['df_1', 'df_2'], axis = 1))
    

    Gives:

               df_1    df_2   
                  1  7    2  7
    2019-01-01    0  0    0  0
    2019-01-02    1  1    1  1
    2019-01-03    0  0    1  1
    2019-01-04    1  0    1  1
    2019-01-05    0  0    0  1