apache-spark amazon-s3 spark-structured-streaming ceph

Multiple S3 credentials in a Spark Structured Streaming application

I want to migrate our Delta lake from S3 to Parquet files in our own on-prem Ceph storage, both accessible through the S3-compliant s3a API in Spark. Is there a possibility to provide different credentials for readStream and writeStream to achieve this?

Solution

the s3a connector supports per-bucket configuration, so you can declare a different set of secrets, endpoint etc for your internal buckets from your external ones.

consult the hadoop s3a docs for the normative details and examples