Search code examples
pandasslicemulti-index

Divide pandas multiindex slices by each other


I have a pandas multiindex data frame with four index levels. I am trying to divide a slice of this dataframe by another slice of the same dataframe.

import pandas as pd

df = pd.DataFrame(
    data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
          "indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
          "unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
          "year": ["2017", "2018","2019", "2017","2018","2019"],
          "country1": [1, 2, 3, 2, 4, 6],
          "country2": [4, 5, 6, 40, 50, 60]}
)

df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True)

print(df.loc[(slice(None), ["ind_a"]), :] / df.loc[(slice(None), ["ind_b"]), :])

Although the individual slices produce a valid slice of df, this simple division leads to all NaN. If I were to drop the first index level and perform the same slicing and division operation, I do get correct results. However, the indicator index-level will then be removed, which makes sense.

df1.droplevel(0)
print(df.loc["ind_a", :] / df.loc["ind_b", :])

At the end of the day, I would like to append the result of the division to the existing df dataframe. I need to assign the first two levels of the multiindex. Something like data_provider="prov_a" and indicator="ind_c". How can I do this?


Solution

  • The source of your problem is that both sides of division have first values at level 1 of the MultiIndex.

    So if you drop this level of the index and then perform the division:

    res = df.loc[(slice(None), ["ind_a"]), :].droplevel([1]) / \
        df.loc[(slice(None), ["ind_b"]), :].droplevel([1])
    

    you will get the proper result.

    To append this result to the source DataFrame, run:

    res2 = pd.concat([res], keys=['ind_c'], names=['indicator']).swaplevel(0,1)
    df = pd.concat([df, res2])
    

    The result is:

                                       country1  country2
    data_provider indicator unit year                    
    prov_a        ind_a     EUR  2017       1.0       4.0
                                 2018       2.0       5.0
                                 2019       3.0       6.0
                  ind_b     EUR  2017       2.0      40.0
                                 2018       4.0      50.0
                                 2019       6.0      60.0
                  ind_c     EUR  2017       0.5       0.1
                                 2018       0.5       0.1
                                 2019       0.5       0.1