Search code examples
pythonpandasdataframefillna

How to merge 2 dataframes with order


I have 2 dataframes

df_My which looks like this

Seq    Grp
-----------
1      1
2      1
3      2
4      1
5      2
6      2
7      3
8      3
9      1
.
.
135    3
136    1
137    1
138    2
139    3

The other dataframe is df_Filer

df_Filer

Seq    Grp    Alg
-------------------
1      1      J
3      2      F
7      3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

I want to merge both dataframes into 1 where

1- if df_My Seq number has is already in df_Filer then it should merge with that number is df_My based on Seq in both df_My and df_Filter

2- if df_My Seq number is not in df_Filer then it should join with the smallest Seq in the same Grp

so the end result should be as follow

Seq    Grp    Alg
------------------
1      1      J
2      1      J
3      2      F
4      1      J
5      2      F
6      2      F
7      3      Z
8      3      Z
9      1      J
.
.
135    3      Z
136    1      M
137    1      R
138    2      T
139    3      Y

I tried this but it does not give me the expected result

df_Out = df_My.merge(df_Filter, axis=1), on='Grp')

Any idea how to achieve this?


Solution

  • As @sammywemmy suggested, you could use merge_asof:

    out = pd.merge_asof(df1, df2, on='Seq', by='Grp')
    

    Another options is, you can left-merge, then use groupby + idxmin to create a mapping from Grp to Alg for the missing values, then fill:

    df_merged = df1.merge(df2, on='Seq', how='left', suffixes=('','_')).drop(columns=['Grp_'])
    no_nan = df_merged.dropna()
    mapping = df_merged['Grp'].map(no_nan.loc[no_nan.groupby('Grp')['Seq'].idxmin(), ['Grp','Alg']].set_index('Grp')['Alg'])
    df_merged['Alg'] = df_merged['Alg'].fillna(mapping)
    

    Output:

        Seq  Grp Alg
    0     1    1   J
    1     2    1   J
    2     3    2   F
    3     4    1   J
    4     5    2   F
    5     6    2   F
    6     7    3   Z
    7     8    3   Z
    8     9    1   J
    9   135    3   Z
    10  136    1   M
    11  137    1   R
    12  138    2   T
    13  139    3   Y