I have 2 csv files, the first one contains timestamp of event. I want to add weather details from the 2nd csv file.
File 1 (irregular time stamps)
Time UserID_Detected
2019-01-01 01:00:32 3
2019-01-02 06:02:12 2
2019-01-02 14:45:32 7
File 2 (hourly weather)
Time Temperature Humidity
1/1/2019 0:00 32 55
1/1/2019 1:00 33 60
1/1/2019 2:00 33 53
.............
1/2/2019 6:00 35 58
.............
1/2/2019 15:00 38 62
I want to attach the weather variable to the first file. Since the first file contains minutes and second info, I will round to nearest hour. Expected output:
File 1 (final output)
Time UserID_Detected Temperature Humidity
2019-01-01 01:00:32 3 33 60
2019-01-02 06:02:12 2 35 58
2019-01-02 14:45:32 7 38 62
Tried so far:
df_event = pd.read_csv("df1.csv")
df_weather = pd.read_csv("df2.csv")
df_event ['Time'] = pd.to_datetime(df_event['Time'])
df_weather ['Time'] = pd.to_datetime(df_weather['Time'])
df_event = df_event .set_index('Time')
df_weather = df_weather .set_index('Time')
df_event = pd.merge(df_event ,df_weather , how='outer',left_index=True, right_index=True)
Use, pd.merge_asof
with direction=nearest
to merge the two dataframes on the column Time
:
df_event['Time'] = pd.to_datetime(df_event['Time'])
df_weather['Time'] = pd.to_datetime(df_weather['Time'])
df_merged = pd.merge_asof(
df_event.sort_values(by='Time'), df_weather.sort_values(by='Time'),
on='Time', direction='nearest')
Result:
# print(df_merged)
Time UserID_Detected Temperature Humidity
0 2019-01-01 01:00:32 3 33 60
1 2019-01-02 06:02:12 2 35 58
2 2019-01-02 14:45:32 7 38 62