I have a dataframe like this
df = (pd.DataFrame({'ID': ['ID1', 'ID2', 'ID3'],
'Values': [['AB', 'BC'], np.NaN, ['AB', 'CD']]}))
df
ID Values
0 ID1 [AB, BC]
1 ID2 NaN
2 ID3 [AB, CD]
I want to split the item inside list into column such that
ID AB BC CD
0 ID1 1 1 0
1 ID2 0 0 0
2 ID3 1 0 1
Pandas functions working with missing values nice, so use Series.str.join
with Series.str.get_dummies
, DataFrame.pop
is for extract column and last join
to original data:
df = df.join(df.pop('Values').str.join('|').str.get_dummies())
print (df)
ID AB BC CD
0 ID1 1 1 0
1 ID2 0 0 0
2 ID3 1 0 1
EDIT: If values are not lists, only string representation of lists use ast.literal_eval
for converting to lists:
import ast
df = (df.join(df.pop('Values')
.apply(ast.literal_eval)
.str.join('|')
.str.get_dummies()))