Search code examples
pythonpandasdatetimetimedelta

How to calculate differences between two pandas.Timestamp Series in nanoseconds


I have two Series which are pd.Timestamps, and they are extremely close. I'd like to get the elementwise difference between the two Series, but with nanosecond precision.

First Series:

0    2021-05-21 00:02:11.349001429
1    2021-05-21 00:02:38.195857153
2    2021-05-21 00:03:25.527530228
3    2021-05-21 00:03:26.653410069
4    2021-05-21 00:03:26.798157366

Second Series:

0    2021-05-21 00:02:11.348997322
1    2021-05-21 00:02:38.195852267
2    2021-05-21 00:03:25.527526087
3    2021-05-21 00:03:26.653406759
4    2021-05-21 00:03:26.798154350

Now if I simply use the - operator, I will truncate the nanoseconds difference. It will show something like this:

Series1 - Series2
0    00:00:00.000004
1    00:00:00.000004
2    00:00:00.000004
3    00:00:00.000003
4    00:00:00.000003

I don't want to lose the nanosecond precision when calculating the differences between Timestamps. I have hacked up a solution that involves doing a for loop over each row, and calculating the scalar difference in pd.Timedelta, then getting the microseconds and nanoseconds out of that. Like this (for the first element):

single_diff = Series1[0] - Series2[0]
single_diff.microseconds * 1000 + single_diff.nanoseconds
4107

Is there a neater vectorized way to do this, instead of a for loop?


Solution

  • You won't lose precision if you work with timedelta as shown. The internal representation is always nanoseconds. After calculating the timedelta, you can convert to integer to obtain the difference in nanoseconds. Ex:

    import pandas as pd
    import numpy as np
    
    s1 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.349001429",
                         "2021-05-21 00:02:38.195857153",
                         "2021-05-21 00:03:25.527530228",
                         "2021-05-21 00:03:26.653410069",
                         "2021-05-21 00:03:26.798157366"]))
    
    s2 = pd.Series(pd.to_datetime(["2021-05-21 00:02:11.348997322",
                         "2021-05-21 00:02:38.195852267",
                         "2021-05-21 00:03:25.527526087",
                         "2021-05-21 00:03:26.653406759",
                         "2021-05-21 00:03:26.798154350"]))
    
    delta = (s1-s2).astype(np.int64)
    
    delta
    0    4107
    1    4886
    2    4141
    3    3310
    4    3016
    dtype: int64
    

    Note: I'm using numpy's int64 type here since on some systems, the built-in int will result in 32-bit integers, i.e. the conversion fails.