Search code examples
pythonpandasdataframenumpytime-series

Python : Merge two dataframes based on created time


I have two dfs, two df's has be to be merged by class and joining dates. Please check the below df's

df1


class      teacher  age instructor_joining_date
   A        mark    50  2024-01-20 07:18:29.599
   A        john    45  2024-05-08 05:31:21.379


df2


class   count   student_joining_date
A          1    2024-05-17 01:05:58.072
A         50    2024-04-10 10:39:06.608
A         75    2024-04-05 09:49:07.246


Final output df

class   count   student_joining_date      teacher   age
A         1    2024-05-17 01:05:58.072    john       45
A        50    2024-04-10 10:39:06.608    mark       50
A        75    2024-04-05 09:49:07.246    mark       50

For df2 we have merge df1 by class and joining dates

Edit: Yes if student_joining_date and instructor_joining_date is different. If student_joining_date is greater than instructor_joining_date then that teacher will be mapped here


Solution

  • You have to use a merge_asof, then restore the original order with reindex:

    df1['instructor_joining_date'] = pd.to_datetime(df1['instructor_joining_date'])
    df2['student_joining_date'] = pd.to_datetime(df2['student_joining_date'])
    
    out = (pd.merge_asof(df2.sort_values(by='student_joining_date').reset_index(),
                         df1.sort_values(by='instructor_joining_date'),
                         left_on='student_joining_date',
                         right_on='instructor_joining_date',
                         by='class')
             .set_index('index').reindex(df2.index)
             .drop(columns='instructor_joining_date')
          )
    

    Output:

      class  count    student_joining_date teacher  age
    0     A      1 2024-05-17 01:05:58.072    john   45
    1     A     50 2024-04-10 10:39:06.608    mark   50
    2     A     75 2024-04-05 09:49:07.246    mark   50
    

    Intermediate before dropping the instructor_joining_date column:

      class  count    student_joining_date teacher  age instructor_joining_date
    0     A      1 2024-05-17 01:05:58.072    john   45 2024-05-08 05:31:21.379
    1     A     50 2024-04-10 10:39:06.608    mark   50 2024-01-20 07:18:29.599
    2     A     75 2024-04-05 09:49:07.246    mark   50 2024-01-20 07:18:29.599