python pandas multidimensional-array stringindexoutofbounds

"Out of bounds nanosecond timestamp"? How do you avoid this error?

I have an array, recognised as a 'numpy.ndarray object' which prints the following output when running the following code:

with sRW.SavReaderNp('C:/Users/Sam/Downloads/Data.sav') as reader:
record = reader.all()
print(record)

Output:

[(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Sam', 250000., '2019-08-05T00:00:00.000000')
 (b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'James',  250000., '2019-08-05T00:00:00.000000')
 (b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Mark', 250000., '0001-01-01T00:00:00.000000')

I really want to process empty date variables within a pandas DataFrame using pd.DataFrame format, but when I run the following code an error appears (as shown bellow the code):

SPSS_df = pd.DataFrame(record)

Error: "Out of bounds nanosecond timestamp: 1-01-01 00:00:00"

I've read through the source code of SavReader Module Documentation and it says if a Datetime value is not found, the following date is assigned:

datetime.datetime(datetime.MINYEAR, 1, 1, 0, 0, 0)

I wondered how could I process this date without encountering this error, perhaps changing/maniuplating this code above?

Solution

What you can do, is read all the records as strings (object) and after convert the column into the wanted type (float and datetimes)

import numpy as np
import pandas as pd

record = [
    (
        b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
        b'Sam',
        250000.0,
        '2019-08-05T00:00:00.000000',
    ),
    (
        b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
        b'James',
        250000.0,
        '2019-08-05T00:00:00.000000',
    ),
    (
        b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
        b'Mark',
        250000.0,
        '0001-01-01T00:00:00.000000',
    ),
]

SPSS_df = pd.DataFrame(record, dtype=object).rename(
    {2: 'some_float', 3: 'dates'}, axis='columns'
).assign(
    some_float=lambda x: x['some_float'].astype(np.float),
    dates=lambda x: pd.to_datetime(x['dates'], errors='coerce'),
)

This gives:

0  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'    b'Sam'    250000.0 2019-08-05
1  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'  b'James'    250000.0 2019-08-05
2  b'61D8894E-7FB0-3DE6-E053-6C04A8C01207'   b'Mark'    250000.0        NaT

and the types:

SPSS_df.dtypes
0                     object
1                     object
some_float           float64
dates         datetime64[ns]