Search code examples
pythonpandasdataframemergemulti-index

Pandas: merge_asof-like solutions for merging two multi-indexed DataFrames?


I have two dataframes, df1 and df2 say, which are both multi-indexed.

At the first index level, both dataframes share the same keys (i.e. df1.index.get_level_values(0) and df2.index.get_level_values(0) contain the same elements). Those keys are unordered strings, such as ['foo','bar','baz'].

At the second index level, both dataframes have timestamps which are ordered, but unequally spaced.

My question is as follows. I would like to merge df1and df2 in such a way that, for each key at level 1, the values of df2 should be inserted into df1 without changing the order of df1.

I tried using pd.merge, pd.merge_asof and pd.MultiIndex.searchsorted. From the descriptions of those methods, it seems like one of them should do the trick for me, but I cannot figure out how. Ideally, I would like to find a solution that avoids looping over the keys in index.get_level_values(0), since my dataframes can get large.

A few failed attempts for illustration:

df_merged = pd.merge(left=df1.reset_index(), right=df2.reset_index(),
                     left_on=[['some_keys', 'timestamps_df1']], right_on=[['some_keys', 'timestamps_df2']],
                     suffixes=('', '_2')
                     ) # after sorting
# FAILED

df2.index.searchsorted(df1, side='right') # after sorting
# FAILED

Any help is greatly appreciated!


Solution

  • Base on your description , here is the solution from merge_asof

    df_merged = pd.merge_asof(left=df1.reset_index(), right=df2.reset_index(),
                         left_on=['timestamps_df1'], right_on=['timestamps_df2'],by='some_keys',
                         suffixes=('', '_2')
                         )