I have an array, recognised as a 'numpy.ndarray object' which prints the following output when running the following code:
with sRW.SavReaderNp('C:/Users/Sam/Downloads/Data.sav') as reader:
record = reader.all()
print(record)
Output:
[(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Sam', 250000., '2019-08-05T00:00:00.000000')
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'James', 250000., '2019-08-05T00:00:00.000000')
(b'61D8894E-7FB0-3DE6-E053-6C04A8C01207', b'Mark', 250000., '0001-01-01T00:00:00.000000')
I really want to process empty date variables within a pandas DataFrame using pd.DataFrame format, but when I run the following code an error appears (as shown bellow the code):
SPSS_df = pd.DataFrame(record)
Error: "Out of bounds nanosecond timestamp: 1-01-01 00:00:00"
I've read through the source code of SavReader Module Documentation and it says if a Datetime value is not found, the following date is assigned:
datetime.datetime(datetime.MINYEAR, 1, 1, 0, 0, 0)
I wondered how could I process this date without encountering this error, perhaps changing/maniuplating this code above?
What you can do, is read all the records as strings (object) and after convert the column into the wanted type (float and datetimes)
import numpy as np
import pandas as pd
record = [
(
b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
b'Sam',
250000.0,
'2019-08-05T00:00:00.000000',
),
(
b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
b'James',
250000.0,
'2019-08-05T00:00:00.000000',
),
(
b'61D8894E-7FB0-3DE6-E053-6C04A8C01207',
b'Mark',
250000.0,
'0001-01-01T00:00:00.000000',
),
]
SPSS_df = pd.DataFrame(record, dtype=object).rename(
{2: 'some_float', 3: 'dates'}, axis='columns'
).assign(
some_float=lambda x: x['some_float'].astype(np.float),
dates=lambda x: pd.to_datetime(x['dates'], errors='coerce'),
)
This gives:
0 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' b'Sam' 250000.0 2019-08-05
1 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' b'James' 250000.0 2019-08-05
2 b'61D8894E-7FB0-3DE6-E053-6C04A8C01207' b'Mark' 250000.0 NaT
and the types:
SPSS_df.dtypes
0 object
1 object
some_float float64
dates datetime64[ns]