Search code examples
dataframeparquet

Parquet file with more than one schema


I am used to parquet file with a single schema. I came across a file which, seemingly has more than one schema. I used pandas to convert it to a CSV file. The result is some things like this:

table-1,table-2,table-3
0, {data for table-1} {dat for table-2} {data for table-3}

I read the parquet file format and it looks like a single parquet file has a single schema.

Does parquet support more than one schema in a single file?


Solution

  • No, the Parquet format only supports a single schema per file. This schema is written into the footer of the file and accounts for all sections of the file. You could probably reread the CSV file into pandas and save that as a Parquet file, but ultimately you will be better off when you save each table as a separate file. The latter should also be much more performant and space-efficient.