Search code examples
pythonpandasencryptionparquetpyarrow

How to encrypt pandas Dataframe with pyarrow and parquet


I would like to encrypt pandas dataframe as parquet file using the modular encryption. I tought the best way to do that, is to transform the dataframe to the pyarrow format and then save it to parquet with a ModularEncryption option. Something like this:

import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
df = pd.DataFrame(data=d)
import pyarrow as pa
schema = pa.Schema.from_pandas(df)
pa.parquet.write_table(df,"test.parquet",encryption_properties=enc_prop)

My problem is, that I'm stuck with the encrypton_properties creation. Has anyone a idea how to create them?

Big Thanks, Seb


Solution

  • There is an example python file in Apache Arrow repo with

    An example for writing an encrypted parquet and reading an encrypted parquet using master keys managed by Hashicorp Vault KMS.

    More info:

    Hope that helps.