Search code examples
apache-sparkamazon-emr

AWS EMR no host: hdfs:///var/log/spark/apps


I am trying to use AWS EMR (emr-4.3.0) Spark 1.6.0, Hadoop 2.7.0 I created EMR cluster, and added Step(in AWS ERM web) with my sample jar. It's SpringBoot application and written by Java(1.8) (I installed JDK8 in the box)

It run with following command

hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster --class org.springframework.boot.loader.JarLauncher s3://my-test/SparkForSpring-S1.2014.jar

I created SparkContext as following code.

    SparkConf conf = new SparkConf().setAppName("SparkForSpring");
    return new JavaSparkContext(conf);

but it fails with following error, I feel like it's something like not related to my application, I am new to Spark, Yarn though.

Caused by: org.springframework.beans.factory.BeanDefinitionStoreException: Factory method [public org.apache.spark.api.java.JavaSparkContext com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext()] threw exception; nested exception is java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:188)
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:586)
    ... 49 more
Caused by: java.io.IOException: Incomplete HDFS URI, no host: hdfs:///var/log/spark/apps
    at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:143)
    at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2653)
    at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
    at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
    at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
    at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
    at org.apache.spark.util.Utils$.getHadoopFileSystem(Utils.scala:1650)
    at org.apache.spark.scheduler.EventLoggingListener.<init>(EventLoggingListener.scala:66)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:547)
    at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:59)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig.javaSparkContext(SparkConfig.java:35)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b.CGLIB$javaSparkContext$0(<generated>)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b$$FastClassBySpringCGLIB$$10b15a77.invoke(<generated>)
    at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:228)
    at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:312)
    at com.pivotal.demo.spark.rocket.rdd.SparkConfig$$EnhancerBySpringCGLIB$$82429e1b.javaSparkContext(<generated>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:483)
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:166)
    ... 50 more

I read some document but quite not sure what should I do to fix this error. A hint will be greatly helpful.

http://docs.aws.amazon.com/ElasticMapReduce/latest/DeveloperGuide/emr-plan-file-systems.html


Solution

  • I solved this problem by not using SpringBoot's executable jar rather I use maven shade plugin to package only spring related jar files in one jar and using system classloader. Here is full pom.xml

    I got a hint from this question's answer apache-spark 1.3.0 and yarn integration and spring-boot as a container