Search code examples
pythonpandasdataframepivot-tablemulti-index

Pandas Python - How to create new columns with MultiIndex from pivot table


I have created a pivot table with 2 different types of values i) Number of apples from 2017-2020, ii) Number of people from 2017-2020. I want to create additional columns to calculate iii) Apples per person from 2017-2020. How can I do so?

Current code for pivot table:

tdf = df.pivot_table(index="States",
                     columns="Year",
                     values=["Number of Apples","Number of People"],
                     aggfunc= lambda x: len(x.unique()),
                     margins=True)
tdf 

Here is my current pivot table:

                Number of Apples                  Number of People
                2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5
West Virginia      8      35      25      12         2       5       5       4
...

I want my pivot table to look like this, where I add additional columns to divide Number of Apples by Number of People.

                Number of Apples                  Number of People                  Number of Apples per Person
                2017    2018    2019    2020      2017    2018    2019    2020      2017    2018    2019    2020   
California        10      18      20      25         2       3       4       5       5       6       5       5      
West Virginia      8      35      25      12         2       5       5       4       4       7       5       3

I've tried a few things, such as:

  • Creating a new column via assigning new column names, but does not work with multiple column index tdf["Number of Apples per Person"][2017] = tdf["Number of Apples"][2017] / tdf["Number of People"][2017]
  • Tried the other assignment method tdf.assign(tdf["Number of Apples per Person"][2017] = tdf["Enrollment ID"][2017] / tdf["Student ID"][2017]); got this error SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

Appreciate any help! Thanks


Solution

  • What you can do here is stack(), do your thing, and then unstack():

    s = df.stack()
    s['Number of Apples per Person'] = s['Number of Apples'] / s['Number of People']
    df = s.unstack()
    

    Output:

    >>> df
                  Number of Apples                Number of People                Number of Apples per Person               
                              2017 2018 2019 2020             2017 2018 2019 2020                        2017 2018 2019 2020
    California                  10   18   20   25                2    3    4    5                         5.0  6.0  5.0  5.0
    West Virginia                8   35   25   12                2    5    5    4                         4.0  7.0  5.0  3.0
    

    One-liner:

    df = df.stack().pipe(lambda x: x.assign(**{'Number of Apples per Person': x['Number of Apples'] / x['Number of People']})).unstack()