Search code examples
pythonpandasdatetimedifference

Substract values from two columns where same time (pandas, python)


I have a pandas dataFrame with 3 columns of weather data - temperature, time and the name of the weather station.

It looks like this:

Time Station_name Temperature
2022-05-12 22:09:35+00:00 station_a 18.3
2022-05-12 22:09:42+00:00 station_b 18.0

I would like to calculate the temperature difference of station_a from station_b at every same minute (as the time stamps are not exactly equal but precise at minute-level (and there is only one measurement every 10 minutes) in a new column.

Is there a way to do this?


Solution

  • You can use a merge_asof on the two sub-dataframes:

    df['Time'] = pd.to_datetime(df['Time'])
    
    out = (pd
       .merge_asof(df[df['Station_name'].eq('station_a')],
                   df[df['Station_name'].eq('station_b')],
                   on='Time', direction='nearest',
                   tolerance=pd.Timedelta('1min'),
                   suffixes=('_a', '_b')
                  )
       .set_index('Time')
       .eval('diff = Temperature_b - Temperature_a')
       ['diff']
    )
    

    output:

    Time
    2022-05-12 22:09:35+00:00   -0.3
    Name: diff, dtype: float64
    

    You can also try to round the times, but it is more risky if one time gets rounded up and the other down:

    df['Time'] = pd.to_datetime(df['Time'])
    
    (df
     .assign(Time=df['Time'].dt.round('10min'))
     .pivot('Time', 'Station_name', 'Temperature')
     .eval('diff = station_b - station_a')
    )
    

    output:

    Station_name               station_a  station_b  diff
    Time                                                 
    2022-05-12 22:10:00+00:00       18.3       18.0  -0.3