Search code examples
pythonpandasdataframepandas-groupby

Pandas merge with average of second dataframe


I have two panda dataframes.

Data frame one has three columns:

name start_time end_time
alice 04:00 05:00
bob 05:00 07:00

Data frame two has three columns:

time points_1 points_2
04:30 5 4
04:45 8 6
05:30 10 3
06:15 4 7
06:55 1 0

I would like to merge the two dataframes such that the first dataframe now has 5 columns:

name start_time end_time average_point_1 average_point_2
alice 04:00 05:00 6.5 5
bob 05:00 07:00 5 3.33

Where the average_point_1 columns consists of average of points_1 from dataframe two between the start and end time for each row. Similarly average_point_2. Could someone tell me how I can merge the two dataframes like this without having to develop an averaging function to make the columns first and then merging.


Solution

  • Try:

    #convert all time fields to datetime for merge_asof compatibility
    df1["start_time"] = pd.to_datetime(df1["start_time"],format="%H:%M")
    df1["end_time"] = pd.to_datetime(df1["end_time"],format="%H:%M")
    df2["time"] = pd.to_datetime(df2["time"],format="%H:%M")
    
    #merge both dataframes on time
    merged = pd.merge_asof(df2, df1, left_on="time", right_on="start_time")
    
    #groupy and get average for each name
    output = merged.groupby(["name", "start_time", "end_time"],as_index=False).mean()
    
    #convert time columns back to strings if needed
    output["start_time"] = output["start_time"].dt.strftime("%H:%M")
    output["end_time"] = output["end_time"].dt.strftime("%H:%M")
    
    >>> output
         name start_time end_time  points_1  points_2
    0  alice      04:00    05:00       6.5  5.000000
    1    bob      05:00    07:00       5.0  3.333333