I have the following code:
print(df.show(3))
print(df.columns)
df.select('port', 'key', 'return_b', 'return_a', 'return_c', 'return_d', 'return_g').write.format("parquet").save("qwe.parquet")
For some reason this doesn't write the Dataframe into the parquet file with the headers. The print statement above shows me those columns exist but the parquet file doesn't have those headers.
I have also tried:
df.write.option("header", "true").mode("overwrite").parquet(write_folder)
You may find df.to_parquet(...)
more convenient.
If you wish to project down to selected columns, do that first, and then write to parquet.