I'm using Pyarrow, Pyarrow.Parquet as well as Pandas. When I send a Pandas datetime64[ns]
series to a Parquet file and load it again via a drill query, the query shows an Integer like: 1467331200000000 which seems to be something else than a UNIX timestamp.
The query looks like this:
SELECT workspace.id-column AS id-column, workspace.date-column AS date-column
When I open that file within Python again, it loads correctly and still has its datetime64[ns]
type.
Any idea what's going wrong and how to solve this? I want this value being shown as a regular date.
Ok, I found a solution some days ago which I would like to share. I think I initially missed something. It's very important to downcast to [ms] as well as allowing truncating timestamps before sending the dataframe to Parquet for becoming able to open it issue free in Drill:
pq.write_table(table, rf'{name}.parquet',
coerce_timestamps='ms',
allow_truncated_timestamps=True)
When I define a view in Drill I can cast that column as date or timestamp as required.