Search code examples
pythonpandastimeshift

Getting the nearest value of a column with time shift in python


I have a dataframe with a datetime (with milisseconds) as index and a price column. I want to create a new column with the nearest price from the row 20s earlier. For example:

Data Hora
2024-02-01 10:03:39.483   1.880
2024-02-01 10:03:40.540   1.900
2024-02-01 10:03:41.550   1.880
2024-02-01 10:03:43.563   1.890
2024-02-01 10:03:45.567   1.870
2024-02-01 10:03:45.583   1.890
2024-02-01 10:03:46.590   1.900
2024-02-01 10:03:48.620   1.930
2024-02-01 10:03:50.627   1.880
2024-02-01 10:03:51.630   1.890
2024-02-01 10:03:53.647   1.880
2024-02-01 10:03:55.753   1.900
2024-02-01 10:03:59.367   1.890
2024-02-01 10:04:02.497   1.910
2024-02-01 10:04:04.543   1.890
2024-02-01 10:04:05.550   1.860
2024-02-01 10:04:07.577   1.840
2024-02-01 10:04:08.157   1.850
2024-02-01 10:04:10.197   1.880
2024-02-01 10:04:11.887   1.910
2024-02-01 10:04:13.163   1.920

Edit:

Hello @mozway, I want to get the price that is closest to the price of the 20s earlier row.

Expected Output:


2024-02-01 10:03:39.483   1.880
2024-02-01 10:03:40.540   1.880
2024-02-01 10:03:41.550   1.880
2024-02-01 10:03:43.563   1.880
2024-02-01 10:03:45.567   1.880
2024-02-01 10:03:45.583   1.880
2024-02-01 10:03:46.590   1.880
2024-02-01 10:03:48.620   1.880
2024-02-01 10:03:50.627   1.880
2024-02-01 10:03:51.630   1.880
2024-02-01 10:03:53.647   1.880
2024-02-01 10:03:55.753   1.880
2024-02-01 10:03:59.367   1.880
2024-02-01 10:04:02.497   1.880
2024-02-01 10:04:04.543   1.870

Solution

  • If you want the price at the time closest to -20s, use a merge_asof:

    df['Data Hora'] = pd.to_datetime(df['Data Hora'])
    
    df['closest_price_20s'] = (pd
       .merge_asof(df['Data Hora'].sub(pd.Timedelta('20s')).rename('dh'),
                   df['Data Hora'],
                   left_on='dh', right_on='Data Hora', direction='nearest',
                   # tolerance=pd.Timedelta('1s') # uncomment to set a tolerance
                  )['Data Hora']
       .map(df.set_index('Data Hora')['Price'])
    )
    

    Output:

                     Data Hora  Price  closest_price_20s
    0  2024-02-01 10:03:39.483   1.88               1.88
    1  2024-02-01 10:03:40.540   1.90               1.88
    2  2024-02-01 10:03:41.550   1.88               1.88
    3  2024-02-01 10:03:43.563   1.89               1.88
    4  2024-02-01 10:03:45.567   1.87               1.88
    5  2024-02-01 10:03:45.583   1.89               1.88
    6  2024-02-01 10:03:46.590   1.90               1.88
    7  2024-02-01 10:03:48.620   1.93               1.88
    8  2024-02-01 10:03:50.627   1.88               1.88
    9  2024-02-01 10:03:51.630   1.89               1.88
    10 2024-02-01 10:03:53.647   1.88               1.88
    11 2024-02-01 10:03:55.753   1.90               1.88
    12 2024-02-01 10:03:59.367   1.89               1.88
    13 2024-02-01 10:04:02.497   1.91               1.88
    14 2024-02-01 10:04:04.543   1.89               1.89
    15 2024-02-01 10:04:05.550   1.86               1.87
    16 2024-02-01 10:04:07.577   1.84               1.90
    17 2024-02-01 10:04:08.157   1.85               1.93
    18 2024-02-01 10:04:10.197   1.88               1.88
    19 2024-02-01 10:04:11.887   1.91               1.89
    20 2024-02-01 10:04:13.163   1.92               1.88