Search code examples
pythonpandasdataframemulti-index

Add columns to a new level in multiindex dataframe


My dataframe looks like this:

data = {
 'WholesalerID': {0: 121, 1: 121, 2: 42, 3: 42, 4: 54, 5: 43, 6: 432, 7: 4245, 8: 4245, 9: 4245, 10: 457},
 'Brand': {0: 'Vans', 1: 'Nike', 2: 'Nike', 3: 'Vans',4: 'Vans', 5: 'Nike', 6: 'Puma', 7: 'Vans', 8: 'Nike', 9: 'Puma', 10: 'Converse'},
 'Shop 1': {0: 'Yes', 1: 'No', 2: 'Yes', 3: 'Maybe', 4: 'Yes', 5: 'No', 6: 'Yes', 7: 'Yes', 8: 'Maybe', 9: 'Maybe', 10: 'No'}
}
df = pd.DataFrame.from_dict(data)

df = df.assign(count=1)
pivoted_df = pd.pivot_table(
    df,
    index=["Brand"],
    columns=["Shop 1"],
    values=["count"],
    aggfunc={"count": "count"},
    fill_value=0,
    margins=True,
    margins_name="N",
)

enter image description here

I need to add columns N, Count, Prop on the first level, I am trying the following, but It does not work:

pivoted_df.columns = pd.MultiIndex.from_product(
    [pivoted_df.columns, ["N", "count", "prop"]]
)

Desired output:enter image description here


Solution

  • Here is one way to do it with Pandas indexing and concat:

    dfs = []
    for col in ("Yes", "Maybe", "No"):
        tmp = (
            pivoted_df.loc[:"Vans", [("count", col), ("count", "N")]]
            .rename(columns={col: "count"})
            .rename(columns={"count": col}, level=0)
        )
        tmp[(col, "prop")] = tmp[(col, "count")] / tmp[(col, "N")]
        dfs.append(tmp.reindex(columns=[(col, "N"), (col, "count"), (col, "prop")]))
    
    new_df = pd.concat(dfs, axis=1)
    

    Then:

    print(new_df)
    # Output
             Yes             Maybe             No
    Shop 1     N count  prop     N count  prop  N count prop
    Brand
    Converse   1     0  0.00     1     0  0.00  1     1  1.0
    Nike       4     1  0.25     4     1  0.25  4     2  0.5
    Puma       2     1  0.50     2     1  0.50  2     0  0.0
    Vans       4     3  0.75     4     1  0.25  4     0  0.0