Search code examples
pythonpandascrosstab

Can I include multiple data columns from excel to side index in pandas, python


I need to extract data from a multi choice brand question (Data in excel as separate columns for each options) into side index in a crosstab table. I have tried with passing list in index argument in crosstab, but it doesn't give the desired output. Also helpful if I could get a total column along with it.

Below is the data sample and the required output format.

Brand 1 Brand 2 Brand 3 Brand 4 Gender
HP Acer Male
Dell Acer MSI Apple Male
Apple HP Asus Female
HP Apple Male

Output required:

Brand Male Female
Acer 2 0
Apple 2 1
Asus 0 1
Dell 1 0
HP 2 1
MSI 1 0

Really appreciate your efforts.


Solution

  • Try crosstab after replacing blank spaces with nan and stack , then rename the axis if you want:

    o = pd.crosstab(df.filter(like="Brand").replace('',np.nan).stack().droplevel(1),
                    df['Gender'])
    out = o.rename_axis(index='Brand',columns=None).reset_index()
    

    print(out)
    
       Brand  Female  Male
    0   Acer       0     2
    1  Apple       1     2
    2   Asus       1     0
    3   Dell       0     1
    4     HP       1     2
    5    MSI       0     1
    

    Adding another alternative using value_counts and unstack:

    a = df.filter(like='Brand').replace('',np.nan).stack().to_frame('Brands').droplevel(1)
    out = (a.join(df['Gender']).groupby("Brands")['Gender'].value_counts()
            .unstack(fill_value=0))