Search code examples
pythonparquetpyarrow

pyarrow writting Parquet files keeps overriding existing data sets


I'm trying to write to an existing Parquet file stored on the local filesystem. But when writing multiple times, the previous one gets overridden instead of added.

from datetime import datetime
import os
import pandas as pd
import pyarrow as pa
import pyarrow.parquet as pq


def append_to_parquet_table(dataframe, filename):
    full_path = os.path.join('.', filename)
    table = pa.Table.from_pandas(dataframe)
    writer = pq.ParquetWriter(full_path, table.schema)
    writer.write_table(table=table)

def save(passed):
    data = {'number': [1234], 
            'verified': [passed], 
            'date': datetime.now().strftime("%Y-%m-%d %H:%M:%S")}
    data_df = pd.DataFrame(data)
    append_to_parquet_table(data_df, 'results.parquet')

save(True)
save(False)

Why is the first data set being "updated" instead of a new one written?


Solution

  • I'm trying to write to an existing Parquet file stored on the local filesystem.

    This isn't supported by the file format. Parquet files are immutable after being written.