Search code examples
appendparquetpython-polarswritefile

How to append data to existing Parquet from Polars


I have multiple polars dataframes and I want to append them to an existing Parquet file.

df.write_parquet("path.parquet") overwrites the existing parquet file. How can I append?


Solution

  • Polars does not support appending to Parquet files, and most tools do not, see for example this SO post.

    Your best bet would be to cast the dataframe to an Arrow table using .to_arrow(), and use pyarrow.dataset.write_dataset. In particular, see the comment on the parameter existing_data_behavior. Still, that requires organizing your data in partitions, which effectively means you have a separate parquet file per partition, stored in the same directory. So each df you have, becomes its own parquet file, and you abstract away from that on the read. Polars does not support writing partitions as far as I'm aware. There is support for reading though, see the source argument in pl.read_parquet.