I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.
It looks like this:
Time | Station_name | Temperature |
---|---|---|
2022-05-12 22:09:35+00:00 | station_a | 18.3 |
2022-05-12 22:09:42+00:00 | station_b | 18.0 |
I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.
Is there a way to do this?
You can use a merge_asof
on the two sub-dataframes:
df['Time'] = pd.to_datetime(df['Time'])
out = (pd
.merge_asof(df[df['Station_name'].eq('station_a')],
df[df['Station_name'].eq('station_b')],
on='Time', direction='nearest',
tolerance=pd.Timedelta('1min'),
suffixes=('_a', '_b')
)
.set_index('Time')
.eval('diff = Temperature_b - Temperature_a')
['diff']
)
output:
Time
2022-05-12 22:09:35+00:00 -0.3
Name: diff, dtype: float64
You can also try to round
the times, but it is more risky if one time gets rounded up and the other down:
df['Time'] = pd.to_datetime(df['Time'])
(df
.assign(Time=df['Time'].dt.round('10min'))
.pivot('Time', 'Station_name', 'Temperature')
.eval('diff = station_b - station_a')
)
output:
Station_name station_a station_b diff
Time
2022-05-12 22:10:00+00:00 18.3 18.0 -0.3