My input is this dict :
response = {
'A': ['CATEGORY 2'],
'B': ['CATEGORY 1', 'CATEGORY 2'],
'C': [],
'D': ['CATEGORY 3'],
}
And I'm trying to make this kind of dataframe :
| ITEM | CATEGORY 1 | CATEGORY 2 | CATEGORY 3 |
| A | | x | |
| B | x | x | |
| C | | | |
| D | | | x |
For that I made the code below but the result was extremely unexpected.
df = pd.DataFrame.from_dict(response, orient='index').fillna('x')
df = df.reset_index()
df = df.rename(columns={'index': 'ITEM'})
print(df)
ITEM 0 1
0 A CATEGORY 2 x
1 B CATEGORY 1 CATEGORY 2
2 C x x
3 D CATEGORY 3 x
Do you guys have a solution for that ? I'm open to any suggestion.
Using str.get_dummies
:
out = (pd.Series(map('|'.join, response.values()), index=response.keys())
.rename_axis('ITEM')
.str.get_dummies()
#.replace({0: '', 1: 'x'}) # uncomment for Xs
.reset_index()
)
Or, using a dictionary comprehension:
out = (pd.DataFrame
.from_dict({k: {v: 'X' for v in l} for k,l in response.items()},
orient='index')
.rename_axis('ITEM').reset_index()
)
Output:
ITEM CATEGORY 1 CATEGORY 2 CATEGORY 3
0 A 0 1 0
1 B 1 1 0
2 C 0 0 0
3 D 0 0 1
With Xs:
ITEM CATEGORY 1 CATEGORY 2 CATEGORY 3
0 A x
1 B x x
2 C
3 D x