Let's say we have a dataframe in this format:
id properties
0 {"cat1":["p1","p2","p4"],"cat2":["p5", "p6"]}
1 {"cat1":["p3"],"cat2":["p7"]}
How can we convert it to this format?
id p1 p2 p3 p4 p5 p6 p7
0 True True False True True True False
1 False False True False False False True
Keep in mind that the type of values in each cell is string. It has just two categories: cat1 and cat2
Set_index to id
. As you said each cell is a string, so you need to convert df.properties
from string of dict to dict by using ast.literal_eval
. Next, and using str
method to get cat1
and cat2
and combine their lists and explode
them to rows and assign result to s
. Finally, call pd.get_dummies
on s
with option dtype=bool
and call max
on level=0
import ast
df1 = df.set_index('id')
df1.properties = df1.properties.map(ast.literal_eval)
s = (df1.properties.str['cat1'] + df1.properties.str['cat2']).explode()
pd.get_dummies(s, dtype=bool).max(level=0)
Out[1035]:
p1 p2 p3 p4 p5 p6 p7
id
0 True True False True True True False
1 False False True False False False True