Search code examples
pythonpandasnumpydatetimetimezone

Why is tz-naive Timestamp converted to integer while tz-aware is kept as Timestamp?


Understandable and expected (tz-aware):

import datetime
import numpy as np
import pandas as pd

aware = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"], tz="UTC")
eod = datetime.datetime.combine(aware[-1].date(), datetime.time.max, aware.tz)
aware, eod, np.concat([aware, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00+00:00', '2024-11-21 12:00:00+00:00'],
               dtype='datetime64[ns, UTC]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                   tzinfo=datetime.timezone.utc),
 array([Timestamp('2024-11-21 00:00:00+0000', tz='UTC'),
        Timestamp('2024-11-21 12:00:00+0000', tz='UTC'),
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999,
                          tzinfo=datetime.timezone.utc)],
       dtype=object))

note Timestamps (and a datetime) in the return value of np.concat.

Unexpected (tz-naive):

naive = pd.DatetimeIndex(["2024-11-21", "2024-11-21 12:00"])
eod = datetime.datetime.combine(naive[-1].date(), datetime.time.max, aware.tz)
naive, eod, np.concat([naive, [eod]])

returns

(DatetimeIndex(['2024-11-21 00:00:00', '2024-11-21 12:00:00'],
               dtype='datetime64[ns]', freq=None),
 datetime.datetime(2024, 11, 21, 23, 59, 59, 999999),
 array([1732147200000000000, 1732190400000000000,
        datetime.datetime(2024, 11, 21, 23, 59, 59, 999999)], dtype=object))

Note integers (and a datetime) in the return value of np.concat.

  1. why do I get integers in the concatenated array for a tz-naive index?
  2. how do I avoid it? I.e., how do I append EOD to a tz-naive DatetimeIndex?

PS: interestingly enough, at the numpy level the indexes are identical:

np.testing.assert_array_equal(aware.values, naive.values)

Solution

  • From Data type promotion in NumPy

    When mixing two different data types, NumPy has to determine the appropriate dtype for the result of the operation. This step is referred to as promotion or finding the common dtype.
    In typical cases, the user does not need to worry about the details of promotion, since the promotion step usually ensures that the result will either match or exceed the precision of the input.

    np.concat() accepts a casting keyword argument (casting="same_kind" default).
    If using casting='no' fails

    naive_no = np.concat([naive, [eod]], casting='no')
    
    TypeError: Cannot cast array data from dtype('<M8[ns]') to dtype('O') according to the rule 'no'
    

    See Array-protocol type strings.

    In both cases the type is object

    naive_sk = np.concat([naive, [eod]], casting='same_kind')
    print(naive_sk.dtype, naive_sk)
    

    Result

    object [1732147200000000000 1732190400000000000
     datetime.datetime(2024, 11, 21, 23, 59, 59, 999999, tzinfo=<DstTzInfo 'America/New_York' LMT-1 day, 19:04:00 STD>)]
    

    python 3.9
    pandas 2.2.2