Search code examples
amazon-web-servicesfile-uploadamazon-s3emramazon-emr

Disable multipart upload on EMR


The goal is to disable the multipart upload on Amazon EMR.

The guide says enter classification=core-site,properties=[fs.s3.multipart.uploads.enabled=false] in Edit Software Settings when creating the EMR cluster.

My questions are:

  1. Can we modify the configurations for existing EMR cluster? If so, how to do it?
  2. Can we achieve the same goal by putting sparkSession.sparkContext.hadoopConfiguration.set("fs.s3.multipart.uploads.enabled","false") in the jar to be executed on EMR?

Solution

  • Unfortunately, you cannot currently modify configurations on a running EMR cluster, but if it's possible for you to start a new one, you could use the AWS EMR Console to clone your current cluster's configuration then modify the configuration before launching it. (Note: Only the configuration is cloned, not any of the data that may be stored in HDFS or on the cluster instances' local disks.)

    However, I believe that what you asked about in your second question will work as intended. Have you tried this and found it not to work?