Search code examples
python-3.xpandasdataframedummy-variable

I have DataFrame's columns and data in list i want to put the relevant data to relevant column


suppose you have given list of all item you can have and separately you have list of data and whose shape of list is not fixed it may contain any number of item you wished to create a dataframe from it and you have to put it on write column for example

columns = ['shirt','shoe','tie','hat']
data = [['hat','tie'],
        ['shoe', 'tie', 'shirt'],
        ['tie', 'shirt',]]
# and from this I wants to create a dummy variable like this 
  shirt  shoe  tie  hat
0   0     0     1    1
1   1     1     1    0
2   1     0     1    0


Solution

  • If want indicator columns filled by 0 and 1 only use MultiLabelBinarizer with DataFrame.reindex if want change ordering of columns by list and if possible some value not exist add only 0 column:

    columns = ['shirt','shoe','tie','hat']
    data = [['hat','tie'],
            ['shoe', 'tie', 'shirt'],
            ['tie', 'shirt',]]
    
    from sklearn.preprocessing import MultiLabelBinarizer
    
    mlb = MultiLabelBinarizer()
    df = (pd.DataFrame(mlb.fit_transform(data),columns=mlb.classes_)
            .reindex(columns, axis=1, fill_value=0))
    print (df)
       shirt  shoe  tie  hat
    0      0     0    1    1
    1      1     1    1    0
    2      1     0    1    0
    

    Or Series.str.get_dummies:

    df = pd.Series(data).str.join('|').str.get_dummies().reindex(columns, axis=1, fill_value=0)
    print (df)
       shirt  shoe  tie  hat
    0      0     0    1    1
    1      1     1    1    0
    2      1     0    1    0