I am working with a cluster where we have custom hadoop 2.4. I am trying to use talend with spark components. For the tSparkConnection components, I have the set the relevant SparkHost, SparkHome.
For the distribution, the two available options are Cloudera and Custom( unsupported). When the Custom( unsupported ) distribution is selected, there is a provision to choose the custom Hadoop version to include the relavant libraries. The options available here are: Cloudera, HortonWorks, MapR, Apache, Amazon EMR, PivotalHD. However for me, when I choose Cloudera it comes with Hadoop 2.3 and I am assuming that all essential libraries are missing, and hence I get an "NoClassDefFoundError" which leads to not being able to load a file in Spark via this Spark connection. Btw, the spark version I have is 1.0.0
I would like to know how to fix this and a way to get this version of Spark running with Hadoop 2.4.
The error is copied and pasted below:
[statistics] connecting to socket on port 3637
[statistics] connected
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/api/java/JavaSparkContext
at sparktest.sparktest_0_1.sparktest.tSparkConnection_2Process(sparktest.java:491)
at sparktest.sparktest_0_1.sparktest.runJobInTOS(sparktest.java:1643)
at sparktest.sparktest_0_1.sparktest.main(sparktest.java:1502)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.api.java.JavaSparkContext
at java.net.URLClassLoader$1.run(URLClassLoader.java:372)
at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:360)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 3 more
[statistics] disconnected
Job sparktest ended at 13:19 21/10/2014. [exit code=1]
Thanks!
Yes CDH 5.0.0 contains Hadoop 2.3. Hadoop 2.4.0 is on the roadmap and sounds like it will be available for CDH 5.x.
Best.