Search code examples
amazon-web-servicesapache-sparkjaramazon-emr

Spark on EMR - Downloading Different Jar Files


I am downloading, using bootstrap, a mysql jar file to the spark/jars folder. I use the following:

sudo aws s3 cp s3://buck/emrtest/mysql-connector-java-5.1.39-bin.jar /usr/lib/spark/jars

Everything downloads correctly but I eventually get a provisioning error and the cluster terminates. I get this error :

On 5 slave instances (including i-0505b9beda64e9,i-0f85f4664e1359 and i-00d346a73f717b), application provisioning failed

It doesn't fail on my master node but fails on my slave nodes. I have checked my logs and it doesn't give me any information. Why does this fail and how would I go about downloading this jar file to every node in a bootstrap fasion?

Thanks!


Solution

  • I figured out the answer. First off, the logging for this is not there. The master node launches on a failure.

    I was retrieving a file in a private s3 bucket. Note: aws configs do not get inherited in your EMR cluster.