I have a pandas multiindex data frame with four index levels. I am trying to divide a slice of this dataframe by another slice of the same dataframe.
import pandas as pd
df = pd.DataFrame(
data={"data_provider": ["prov_a", "prov_a", "prov_a", "prov_a", "prov_a", "prov_a"],
"indicator": ["ind_a", "ind_a", "ind_a", "ind_b", "ind_b", "ind_b"],
"unit": ["EUR", "EUR", "EUR", "EUR", "EUR", "EUR"],
"year": ["2017", "2018","2019", "2017","2018","2019"],
"country1": [1, 2, 3, 2, 4, 6],
"country2": [4, 5, 6, 40, 50, 60]}
)
df = df.set_index(["data_provider", "indicator", "unit", "year"], drop=True)
print(df.loc[(slice(None), ["ind_a"]), :] / df.loc[(slice(None), ["ind_b"]), :])
Although the individual slices produce a valid slice of df
, this simple division leads to all NaN. If I were to drop the first index level and perform the same slicing and division operation, I do get correct results. However, the indicator
index-level will then be removed, which makes sense.
df1.droplevel(0)
print(df.loc["ind_a", :] / df.loc["ind_b", :])
At the end of the day, I would like to append the result of the division to the existing df
dataframe. I need to assign the first two levels of the multiindex. Something like data_provider="prov_a"
and indicator="ind_c"
. How can I do this?
The source of your problem is that both sides of division have first values at level 1 of the MultiIndex.
So if you drop this level of the index and then perform the division:
res = df.loc[(slice(None), ["ind_a"]), :].droplevel([1]) / \
df.loc[(slice(None), ["ind_b"]), :].droplevel([1])
you will get the proper result.
To append this result to the source DataFrame, run:
res2 = pd.concat([res], keys=['ind_c'], names=['indicator']).swaplevel(0,1)
df = pd.concat([df, res2])
The result is:
country1 country2
data_provider indicator unit year
prov_a ind_a EUR 2017 1.0 4.0
2018 2.0 5.0
2019 3.0 6.0
ind_b EUR 2017 2.0 40.0
2018 4.0 50.0
2019 6.0 60.0
ind_c EUR 2017 0.5 0.1
2018 0.5 0.1
2019 0.5 0.1