Search code examples
python-3.xpysparkparquet

Writing Dataframe to a parquet file but no headers are being written


I have the following code:

print(df.show(3))
print(df.columns)



df.select('port', 'key', 'return_b', 'return_a', 'return_c', 'return_d', 'return_g').write.format("parquet").save("qwe.parquet")

For some reason this doesn't write the Dataframe into the parquet file with the headers. The print statement above shows me those columns exist but the parquet file doesn't have those headers.

I have also tried:

df.write.option("header", "true").mode("overwrite").parquet(write_folder)

Solution

  • You may find df.to_parquet(...) more convenient.

    If you wish to project down to selected columns, do that first, and then write to parquet.