Search code examples
pythonpandasdataframedatetimemerge

Merge 3 dataframes with different timesteps (10min and 15min and 30min) using pandas


The goal is to merge three different dataframes having different timesteps (10min, 15min and 30min. The code must recognize what timestep to consider firstly and identify the next available next timestep. in This example 2019/04/02 10:40:00 does not exist in the dataframes dataset. Therefore the next timestep to consider after 2019/04/02 10:30:00 would be 2019/04/02 10:45:00.

df1:

Timestamp data1
2019/04/02 10:00:00 1
2019/04/02 10:10:00 1
2019/04/02 10:20:00 1
2019/04/02 10:30:00 1

df2:

Timestamp data2
2019/04/02 10:00:00 2
2019/04/02 10:15:00 22
2019/04/02 10:30:00 222
2019/04/02 10:45:00 2222
2019/04/02 11:00:00 22222

df3:

Timestamp data3
2019/04/02 10:00:00 3
2019/04/02 10:30:00 33
2019/04/02 11:00:00 333
2019/04/02 11:30:00 3333

desired result:

Timestamp data1 data2 data3
2019/04/02 10:00:00 1 2 3
2019/04/02 10:10:00 1 NaN NaN
2019/04/02 10:15:00 NaN 22 NaN
2019/04/02 10:20:00 1 NaN NaN
2019/04/02 10:30:00 1 222 33
2019/04/02 10:45:00 NaN 2222 NaN
2019/04/02 11:00:00 NaN 22222 333
2019/04/02 11:30:00 NaN NaN 3333

I used the python concat function and the merge function but did not deliver he desired result.


Solution

  • # Convert Timestamp columns to datetime
    df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
    df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
    df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])
    
    # Sort the DataFrames based on Timestamp
    df1 = df1.sort_values('Timestamp')
    df2 = df2.sort_values('Timestamp')
    df3 = df3.sort_values('Timestamp')
    
    # Merge using merge_asof
    result = pd.merge_asof(df1, df2, on='Timestamp', direction='nearest')
    result = pd.merge_asof(result, df3, on='Timestamp', direction='nearest')
    

    EDIT: IF YOU WANT TO KEEP NAN VALUES

    # Convert Timestamp columns to datetime
    df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
    df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
    df3['Timestamp'] = pd.to_datetime(df3['Timestamp'])
    
    # Merge using merge with how='outer'
    result = pd.merge(df1, df2, on='Timestamp', how='outer')
    result = pd.merge(result, df3, on='Timestamp', how='outer')
    
    # Sort the result based on Timestamp
    result = result.sort_values('Timestamp')