python-3.x pandas dataframe dummy-variable

I have DataFrame's columns and data in list i want to put the relevant data to relevant column

suppose you have given list of all item you can have and separately you have list of data and whose shape of list is not fixed it may contain any number of item you wished to create a dataframe from it and you have to put it on write column for example

columns = ['shirt','shoe','tie','hat']
data = [['hat','tie'],
        ['shoe', 'tie', 'shirt'],
        ['tie', 'shirt',]]
# and from this I wants to create a dummy variable like this 
  shirt  shoe  tie  hat
0   0     0     1    1
1   1     1     1    0
2   1     0     1    0

Solution

If want indicator columns filled by 0 and 1 only use MultiLabelBinarizer with DataFrame.reindex if want change ordering of columns by list and if possible some value not exist add only 0 column:

columns = ['shirt','shoe','tie','hat']
data = [['hat','tie'],
        ['shoe', 'tie', 'shirt'],
        ['tie', 'shirt',]]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df = (pd.DataFrame(mlb.fit_transform(data),columns=mlb.classes_)
        .reindex(columns, axis=1, fill_value=0))
print (df)
   shirt  shoe  tie  hat
0      0     0    1    1
1      1     1    1    0
2      1     0    1    0

Or Series.str.get_dummies:

df = pd.Series(data).str.join('|').str.get_dummies().reindex(columns, axis=1, fill_value=0)
print (df)
   shirt  shoe  tie  hat
0      0     0    1    1
1      1     1    1    0
2      1     0    1    0