How to convert delta to parquet

Hope it is not a dumb question. I have some use case that would require converting delta to parquet.

The most related answer I got from online discussion is (1) call Vacuum with retention 0 hours to only keep the latest version (2) delete delta_log directory that contains the metadata and transaction logs for delta format.

May I know whether that is normally enough to convert delta to parquet?

I did some online searching and learning and I still have below questions

For parquet format, we have multiple .parquet files, all of them together representing the whole dataset.

For delta, we have multiple "versions" of parquets. Does each of them representing the whole dataset? Or more correctly, each of them contains different state of the dataset (e.g. snapshot). How Vacuum deals with these .parquet files in detail?

Thanks

Solution

You can read the delta table into a df, and write it back in parquet format.

This example uses pyspark code:

df = spark.read.format("delta").load("/tmp/delta-table")
df.write.parquet("/tmp/parquet-table.parquet")