Let say I have a pyarrow table with a column Timestamp
containing float64
.
These floats are actually timestamps experessed in s.
For instance:
import pyarrow as pa
my_table = pa.table({'timestamp': pa.array([1600419000.477,1600419001.027])})
I read about Parquet Logical Type from documentation. Please, how can I convert these float values to the Logical Type TIMESTAMP? I see no documentation about the way to do this.
Thank you for your help. Have a good day, Bests,
You will need to convert the floats into an actual timestamp type in pyarrow, and then it will automatically be written to a paruet logical timestamp type.
Using the pyarrow.compute
module, this conversion can also be done in pyarrow (a bit less ergonomic as doing the conversion in pandas, but avoiding a conversion to pandas and back):
>>> import pyarrow.compute as pc
>>> arr = pa.array([1600419000.477,1600419001.027])
>>> pc.multiply(arr, pa.scalar(1000.)).cast("int64").cast(pa.timestamp('ms'))
<pyarrow.lib.TimestampArray object at 0x7fe5ec3df588>
[
2020-09-18 08:50:00.477,
2020-09-18 08:50:01.027
]