Normally, there is a spark-defaults.conf
file located in /etc/spark/conf
after I create a spark cluster on EMR.
If I provide no custom configs, you'll find spark-defaults.conf
sitting happily in the conf directory:
[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct 4 08:08 .
drwxr-xr-x 3 root root 4096 Oct 4 07:41 ..
-rw-r--r-- 1 root root 987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct 4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct 4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct 4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root 865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 2680 Oct 4 08:08 spark-defaults.conf
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct 4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template
Following the instructions from http://docs.aws.amazon.com//ElasticMapReduce/latest/ReleaseGuide/emr-configure-apps.html , i'm trying to add a jar to the driver and executor extraClassPath properties.
[
{
"Classification": "spark-defaults",
"Properties": {
"spark.driver.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar",
"spark.executor.extraClassPath": ":/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/home/hadoop/mysql-connector-java-5.1.39-bin.jar"
},
"Configurations":[
]
}
]
I dont see any errors upon creation of the cluster, but the spark-defaults.conf
file never appears when I add this config.
and here's an ls
showing that the file does not exist in the filesystem:
[hadoop@ip-x-x-x-x ~]$ ls -la /etc/spark/conf/
total 64
drwxr-xr-x 2 root root 4096 Oct 4 08:08 .
drwxr-xr-x 3 root root 4096 Oct 4 07:41 ..
-rw-r--r-- 1 root root 987 Jul 26 21:56 docker.properties.template
-rw-r--r-- 1 root root 1105 Jul 26 21:56 fairscheduler.xml.template
-rw-r--r-- 1 root root 2373 Oct 4 07:42 hive-site.xml
-rw-r--r-- 1 root root 2024 Oct 4 07:42 log4j.properties
-rw-r--r-- 1 root root 2025 Jul 26 21:56 log4j.properties.template
-rw-r--r-- 1 root root 7239 Oct 4 07:42 metrics.properties
-rw-r--r-- 1 root root 7239 Jul 26 21:56 metrics.properties.template
-rw-r--r-- 1 root root 865 Jul 26 21:56 slaves.template
-rw-r--r-- 1 root root 1292 Jul 26 21:56 spark-defaults.conf.template
-rwxr-xr-x 1 root root 1563 Oct 4 07:42 spark-env.sh
-rwxr-xr-x 1 root root 3861 Jul 26 21:56 spark-env.sh.template
What am I doing wrong?
So I just tested this on EMR and the problem is that you have a :
in front of your classpath specification:
"spark.driver.extraClassPath": ":/usr/lib/hadoop-lzo/...
needs to be
"spark.driver.extraClassPath": "/usr/lib/hadoop-lzo/....
Note that AWS also puts things on the classpath by setting extraClassPath
and that stuff you specify in extraClassPath
will overwrite and not append to that. In other words, you should make sure that your spark.xxx.extraClassPath
includes the stuff that AWS puts there by default.