Search code examples
pythonpandasappendparquet

Appending the parquet file while chunking


I am trying to write a pandas dataframe to parquet file format in append mode. However, instead of appending to the existing file, the file is overwritten with new data. What am i missing?

the write syntax is

df.to_parquet(path, mode='append')

the read syntax is

pd.read_parquet(path)

Solution

  • You will have to use fastparquet engine for this.

    import pandas as pd
    import os.path
    
    file_path = "D:\\dev\\output.parquet"
    df = pd.DataFrame(data={'col1': [1, 2,], 'col2': [3, 4]})
    if not os.path.isfile(file_path):
      df.to_parquet(file_path, engine='fastparquet')
    else:
      df.to_parquet(file_path, engine='fastparquet', append=True)
    

    This is described more in this answer here - https://stackoverflow.com/a/74209756/6563567