Search code examples
apache-sparkamazon-s3spark-structured-streamingceph

Multiple S3 credentials in a Spark Structured Streaming application


I want to migrate our Delta lake from S3 to Parquet files in our own on-prem Ceph storage, both accessible through the S3-compliant s3a API in Spark. Is there a possibility to provide different credentials for readStream and writeStream to achieve this?


Solution

  • the s3a connector supports per-bucket configuration, so you can declare a different set of secrets, endpoint etc for your internal buckets from your external ones.

    consult the hadoop s3a docs for the normative details and examples