Search code examples
pandasmulti-index

Align pandas dataframes with multiindex


I have two pandas dataframes df1 and df2 with a different multiindex. I would like to align both dataframes according to the last index level. The shorter dataframe rules, i.e. all the dates that are not in the shorter dataframe should be removed from the longer dataframe.

If I were to drop the first two index levels, I get the result I am looking for. However, I would like to preserve the complete multiindex.

import numpy as np
import pandas as pd

idx1_l1 = np.tile("provider_1", 31)
idx2_l1 = np.tile("provider_2", 22)
idx1_l2 = np.tile("indicator_1", 31)
idx2_l2 = np.tile("indicator_2", 22)
idx1_l3 = pd.date_range(start="2020-01-01", end="2020-01-31")
idx2_l3 = pd.date_range(start="2020-01-10", end="2020-01-31")
data1 = np.random.randint(low=1, high=100, size=31)
data2 = np.random.randint(low=1, high=100, size=22)

df1 = pd.DataFrame(data=data1, index=[idx1_l1, idx1_l2, idx1_l3])
df2 = pd.DataFrame(data=data2, index=[idx2_l1, idx2_l2, idx2_l3])

df1, df2 = df1.droplevel([0, 1]).align(df2.droplevel([0, 1]), join="inner", axis=0)

Solution

  • You can create the new index for the desired level (2 in your case) by intersection and then select the rows with get_indexer:

    idx = df1.index.get_level_values(2).intersection(df2.index.get_level_values(2))
    df1 = df1.iloc[df1.index.get_level_values(2).get_indexer(idx)]
    df2 = df2.iloc[df2.index.get_level_values(2).get_indexer(idx)]