Search code examples
pythonpandasmultidimensional-arraystringindexoutofbounds

"Out of bounds nanosecond timestamp"? How do you avoid this error?


I have an array, recognised as a 'numpy.ndarray object' which prints the following output when running the following code:

with sRW.SavReaderNp('C:/Users/Sam/Downloads/Data.sav') as reader:
record = reader.all()
print(record)

Output:

[(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Sam', 250000., '2019-08-05T00:00:00.000000')
 (b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'James',  250000., '2019-08-05T00:00:00.000000')
 (b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Mark', 250000., '0001-01-01T00:00:00.000000')

I really want to process empty date variables within a pandas DataFrame using pd.DataFrame format, but when I run the following code an error appears (as shown bellow the code):

SPSS_df = pd.DataFrame(record)

Error: "Out of bounds nanosecond timestamp: 1-01-01 00:00:00"

I've read through the source code of SavReader Module Documentation and it says if a Datetime value is not found, the following date is assigned:

datetime.datetime(datetime.MINYEAR, 1, 1, 0, 0, 0)

I wondered how could I process this date without encountering this error, perhaps changing/maniuplating this code above?


Solution

  • What you can do, is read all the records as strings (object) and after convert the column into the wanted type (float and datetimes)

    import numpy as np
    import pandas as pd
    
    record = [
        (
            b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
            b'Sam',
            250000.0,
            '2019-08-05T00:00:00.000000',
        ),
        (
            b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
            b'James',
            250000.0,
            '2019-08-05T00:00:00.000000',
        ),
        (
            b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
            b'Mark',
            250000.0,
            '0001-01-01T00:00:00.000000',
        ),
    ]
    
    SPSS_df = pd.DataFrame(record, dtype=object).rename(
        {2: 'some_float', 3: 'dates'}, axis='columns'
    ).assign(
        some_float=lambda x: x['some_float'].astype(np.float),
        dates=lambda x: pd.to_datetime(x['dates'], errors='coerce'),
    )
    

    This gives:

    0  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'    b'Sam'    250000.0 2019-08-05
    1  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  b'James'    250000.0 2019-08-05
    2  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'   b'Mark'    250000.0        NaT
    

    and the types:

    SPSS_df.dtypes
    0                     object
    1                     object
    some_float           float64
    dates         datetime64[ns]