I'm working with a .sav (SPSS) file in Python. All the variables look fine after import while using PyreadStat (also when using Pandas) except for the datetime variables. They read in as exponential numbers of type float using Python. But their original SPSS format is dd-mmm-yy (e.g., 02-feb-2021) of type date.
This is how the date variable looks like
1.383160e+10
Is there a way to convert this format to datetime using Python?
I've tried various ways of using the datetime module and time module. But what I get is a date from the year 2408
# Here I'm using the float from the first row in the dataframe
time.gmtime(13831603200)
The results
time.struct_time(tm_year=2408, tm_mon=4, tm_mday=22, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=1, tm_yday=113, tm_isdst=0)
When I use the datetime module:
python_date = datetime.fromtimestamp(13831603200).strftime('%d-%b-%Y, %H:%M:%S')
print(python_date)
22-Apr-2408, 00:00:00
[How the datetime variable (Vdatesub) is showing when using Python][1] [1]: https://i.sstatic.net/I7yza.png
This is answered under these two posts (one Python, one R):
Convert 'seconds since October 14, 1582' to Python datetime
Read SPSS file into R, the data format for date is wrong, and generate more variable
In short: the date is stored as number of seconds from 14 Oct 1582, while Python starts at the Epoch date (01 Jan 1970).
You would need to calculate the number of seconds between 1582-10-14 and 1970-01-01 to adjust the timestamp value as per this post:
Timestamp out of range for platform localtime()/gmtime() function
(Possibly 12,218,515,200 seconds)