Search code examples
apache-sparkapache-spark-sqloozieemrhue

Can't instantiate SparkSession on EMR 5.0 HUE


I'm running an EMR 5.0 cluster and I'm using HUE to create an OOZIE workflow to submit a SPARK 2.0 job. I have ran the job with a spark-submit directly on the YARN and as a step on the same cluster. No problem. But when I do it with HUE I get the following error:

java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SessionState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:111)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:110)
    at org.apache.spark.sql.SparkSession.conf$lzycompute(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession.conf(SparkSession.scala:133)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at org.apache.spark.sql.SparkSession$Builder$$anonfun$getOrCreate$5.apply(SparkSession.scala:838)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
    at scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
    at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
    at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
    at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:838)
    at be.infofarm.App$.main(App.scala:22)
    at be.infofarm.App.main(App.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:627)
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 19 more
Caused by: java.lang.IllegalArgumentException: Error while instantiating 'org.apache.spark.sql.internal.SharedState':
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:949)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:100)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:99)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:98)
    at org.apache.spark.sql.internal.SessionState.<init>(SessionState.scala:153)
    ... 24 more
Caused by: java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$reflect(SparkSession.scala:946)
    ... 30 more
Caused by: java.lang.Exception: Could not find resource path for Web UI: org/apache/spark/sql/execution/ui/static
    at org.apache.spark.ui.JettyUtils$.createStaticHandler(JettyUtils.scala:182)
    at org.apache.spark.ui.WebUI.addStaticHandler(WebUI.scala:119)
    at org.apache.spark.sql.execution.ui.SQLTab.<init>(SQLTab.scala:32)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState$$anonfun$createListenerAndUI$1.apply(SharedState.scala:96)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.sql.internal.SharedState.createListenerAndUI(SharedState.scala:96)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:44)
    ... 35 more

When I don't use spark.sql or the SparkSession (instead I used SparkContext) in my Spark job it runs fine. If anyone has any clue what is going on I would be very grateful.

EDIT 1

My maven assembly

  <build>
<sourceDirectory>src/main/scala</sourceDirectory>
<testSourceDirectory>src/test/scala</testSourceDirectory>
<plugins>
  <plugin>
    <groupId>net.alchim31.maven</groupId>
    <artifactId>scala-maven-plugin</artifactId>
    <version>3.1.3</version>
    <executions>
      <execution>
        <goals>
          <goal>compile</goal>
          <goal>testCompile</goal>
        </goals>
        <configuration>
          <args>
            <arg>-dependencyfile</arg>
            <arg>${project.build.directory}/.scala_dependencies</arg>
          </args>
        </configuration>
      </execution>
    </executions>
  </plugin>

  <plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <configuration>
      <archive>
        <manifest>
          <mainClass>be.infofarm.App</mainClass>
        </manifest>
      </archive>
      <descriptorRefs>
        <descriptorRef>jar-with-dependencies</descriptorRef>
      </descriptorRefs>
    </configuration>
    <executions>
      <execution>
        <id>make-assembly</id> <!-- this is used for inheritance merges -->
        <phase>package</phase> <!-- bind to the packaging phase -->
        <goals>
          <goal>single</goal>
        </goals>
      </execution>
    </executions>
  </plugin>
</plugins>


Solution

  • when you run jar with spark-submit all dependant jars are available on the classpath of the machine but when you execute the same using oozie those jars are not available in Oozie's 'sharelib'. you can check the same by executing following command

    oozie admin -shareliblist spark
    

    Step 1. Upload required jars from local machine to HDFS

    hdfs dfs -put /usr/lib/spark/jars/*.jar /user/oozie/share/lib/lib_timestamp/spark/ 
    

    just uploading jars to HDFS won't add them to sharelib you need to update sharelib by executing

    oozie admin -sharelibupdate
    

    hope this helps