Search code examples
pythonparquetpyarrow

How to convert a float to a Parquet TIMESTAMP Logical Type?


Let say I have a pyarrow table with a column Timestamp containing float64. These floats are actually timestamps experessed in s. For instance:

import pyarrow as pa
my_table = pa.table({'timestamp': pa.array([1600419000.477,1600419001.027])})

I read about Parquet Logical Type from documentation. Please, how can I convert these float values to the Logical Type TIMESTAMP? I see no documentation about the way to do this.

Thank you for your help. Have a good day, Bests,


Solution

  • You will need to convert the floats into an actual timestamp type in pyarrow, and then it will automatically be written to a paruet logical timestamp type.

    Using the pyarrow.compute module, this conversion can also be done in pyarrow (a bit less ergonomic as doing the conversion in pandas, but avoiding a conversion to pandas and back):

    >>> import pyarrow.compute as pc
    >>> arr = pa.array([1600419000.477,1600419001.027])
    >>> pc.multiply(arr, pa.scalar(1000.)).cast("int64").cast(pa.timestamp('ms'))
    <pyarrow.lib.TimestampArray object at 0x7fe5ec3df588>
    [
      2020-09-18 08:50:00.477,
      2020-09-18 08:50:01.027
    ]