I have a parquet file which has compression codec BROTLI. BROTLI is not supported by trino Therefore, I need to convert it to a supported codec which is GZIP, SNAPPY,.. Conversion doesn't seem straight forward or at least i could not find any python library which does it. Please share your ideas or strategies for this codec conversion.
You should be able to do this with pyarrow
. It can brotli-compressed Parquet files.
import pyarrow.parquet as pq
table = pq.read_table(<filename>)
pq.write_table(table, <filename)
This will save it as a snappy-compressed file by default. You can specify different compression schemes using the compression
keyword argument.