I need to extract data from a multi choice brand question (Data in excel as separate columns for each options) into side index in a crosstab table. I have tried with passing list in index argument in crosstab, but it doesn't give the desired output. Also helpful if I could get a total column along with it.
Below is the data sample and the required output format.
Brand 1 | Brand 2 | Brand 3 | Brand 4 | Gender |
---|---|---|---|---|
HP | Acer | Male | ||
Dell | Acer | MSI | Apple | Male |
Apple | HP | Asus | Female | |
HP | Apple | Male |
Output required:
Brand | Male | Female |
---|---|---|
Acer | 2 | 0 |
Apple | 2 | 1 |
Asus | 0 | 1 |
Dell | 1 | 0 |
HP | 2 | 1 |
MSI | 1 | 0 |
Really appreciate your efforts.
Try crosstab after replacing blank spaces with nan and stack , then rename the axis if you want:
o = pd.crosstab(df.filter(like="Brand").replace('',np.nan).stack().droplevel(1),
df['Gender'])
out = o.rename_axis(index='Brand',columns=None).reset_index()
print(out)
Brand Female Male
0 Acer 0 2
1 Apple 1 2
2 Asus 1 0
3 Dell 0 1
4 HP 1 2
5 MSI 0 1
Adding another alternative using value_counts and unstack:
a = df.filter(like='Brand').replace('',np.nan).stack().to_frame('Brands').droplevel(1)
out = (a.join(df['Gender']).groupby("Brands")['Gender'].value_counts()
.unstack(fill_value=0))