python pandas numpy datetime unix-timestamp

Safely convert a Pandas datetime Series to Int64 epoch timestamp, preserving nulls

Consider a datetime-dtype Series containing some null values:

import pandas as pd

t = pd.date_range('2022-01-01', '2022-01-02', freq='4H').to_series('t')
t.iat[2] = None

2022-01-01 00:00:00   2022-01-01 00:00:00
2022-01-01 04:00:00   2022-01-01 04:00:00
2022-01-01 08:00:00                   NaT
2022-01-01 12:00:00   2022-01-01 12:00:00
2022-01-01 16:00:00   2022-01-01 16:00:00
2022-01-01 20:00:00   2022-01-01 20:00:00
2022-01-02 00:00:00   2022-01-02 00:00:00
Freq: 4H, dtype: datetime64[ns]

The dtype of this Series is <M8[ns]. As far as I understand, this means that the underlying data is physically stored as 64-bit integers representing nanoseconds since the start of the Unix epoch (numpy.datetime64). However I don't know how NaT is represented.

I would like to obtain an integer timestamp representing the number of nanoseconds since the start of the Unix epoch, with null values wherever there was an NaT in the data. Therefore it seems like I should be able to somehow check the precision of the timestamp data type, extract the raw seconds, and then scale up or down as needed to obtain nanoseconds.

Expected output:

2022-01-01 00:00:00    1640995200000000000
2022-01-01 04:00:00    1641009600000000000
2022-01-01 08:00:00                   <NA>
2022-01-01 12:00:00    1641038400000000000
2022-01-01 16:00:00    1641052800000000000
2022-01-01 20:00:00    1641067200000000000
2022-01-02 00:00:00    1641081600000000000
Freq: 4H, dtype: Int64

Things that did not work:

t.astype(pd.Int64Dtype())  # datetime64[ns] cannot be converted to IntegerDtype`
t.astype(float).astype(pd.Int64Dtype())  # Cannot cast DatetimeArray to dtype float64
t.map(lambda t: t.timestamp()).astype(pd.Int64Dtype())  # NaTType does not support timestamp
t.map(lambda t: t.timestamp(), na_action='ignore').astype(pd.Int64Dtype())  # 'values' contains non-numeric NA NaT

Notably the .timestamp logic gets hung up on the NaT, which I don't know how to cleanly convert to the "null-aware" Int64Dtype.

Something that did work, but is very ugly:

((t - pd.Timestamp(1970, 1, 1)).dt.total_seconds() * 1e09).astype(pd.Int64Dtype())

Is there an idiomatic way to do this?

Solution

Edit

I don't think there is a good way to do that. Another approach:

>>> t.view(np.int64).astype(pd.Int64Dtype()).replace(np.iinfo(np.int64).min, pd.NA)

0    1640995200000000000
1    1641009600000000000
2                   <NA>
3    1641038400000000000
4    1641052800000000000
5    1641067200000000000
6    1641081600000000000
Name: t, dtype: Int64

# Without replace(...)
>>> t.view(np.int64).astype(pd.Int64Dtype())
0     1640995200000000000
1     1641009600000000000
2    -9223372036854775808
3     1641038400000000000
4     1641052800000000000
5     1641067200000000000
6     1641081600000000000
Name: t, dtype: Int64

What you expect is unclear for me. Maybe you can use numpy:

ts = t.values.astype(float)
dti = pd.to_datetime(ts)

>>> ts
array([ 1.64099520e+18,  1.64100960e+18, -9.22337204e+18,  1.64103840e+18,
        1.64105280e+18,  1.64106720e+18,  1.64108160e+18])

>>> dti
DatetimeIndex(['2022-01-01 00:00:00', '2022-01-01 04:00:00',
                               'NaT', '2022-01-01 12:00:00',
               '2022-01-01 16:00:00', '2022-01-01 20:00:00',
               '2022-01-02 00:00:00'],
              dtype='datetime64[ns]', freq=None)

>>> t
0   2022-01-01 00:00:00
1   2022-01-01 04:00:00
2                   NaT
3   2022-01-01 12:00:00
4   2022-01-01 16:00:00
5   2022-01-01 20:00:00
6   2022-01-02 00:00:00
Name: t, dtype: datetime64[ns]