I have two pandas dataframes df1
and df2
with a different multiindex. I would like to align both dataframes according to the last index level. The shorter dataframe rules, i.e. all the dates that are not in the shorter dataframe should be removed from the longer dataframe.
If I were to drop the first two index levels, I get the result I am looking for. However, I would like to preserve the complete multiindex.
import numpy as np
import pandas as pd
idx1_l1 = np.tile("provider_1", 31)
idx2_l1 = np.tile("provider_2", 22)
idx1_l2 = np.tile("indicator_1", 31)
idx2_l2 = np.tile("indicator_2", 22)
idx1_l3 = pd.date_range(start="2020-01-01", end="2020-01-31")
idx2_l3 = pd.date_range(start="2020-01-10", end="2020-01-31")
data1 = np.random.randint(low=1, high=100, size=31)
data2 = np.random.randint(low=1, high=100, size=22)
df1 = pd.DataFrame(data=data1, index=[idx1_l1, idx1_l2, idx1_l3])
df2 = pd.DataFrame(data=data2, index=[idx2_l1, idx2_l2, idx2_l3])
df1, df2 = df1.droplevel([0, 1]).align(df2.droplevel([0, 1]), join="inner", axis=0)
You can create the new index for the desired level (2 in your case) by intersection
and then select the rows with get_indexer
:
idx = df1.index.get_level_values(2).intersection(df2.index.get_level_values(2))
df1 = df1.iloc[df1.index.get_level_values(2).get_indexer(idx)]
df2 = df2.iloc[df2.index.get_level_values(2).get_indexer(idx)]