Search code examples
javaapache-sparkhadoopencryptionspark3

How to save spark dataset in encrypted format?


I am saving my spark dataset as parquet file in my local machine. I would like to know if there are any ways I could encrypt the data using some encryption algorithm. The code I am using to save my data as parquet file looks something like this.

dataset.write().mode("overwrite").parquet(parquetFile);

I saw a similar question but my query is different as I am writing to my local disk.


Solution

  • I don't think you can do over Spark directly, however there are other projects you can put around Parquet, in special Apache Arrow. I think this video explains how to do it:

    https://databricks.com/session_na21/data-security-at-scale-through-spark-and-parquet-encryption

    UPDATE: since Spark 3.2.0 it seems possible.