I am trying to run a hadoop job on AWS Elastic Map Reduce using a JAR file. I am using a library called EJML https://code.google.com/p/efficient-java-matrix-library/wiki/EjmlManual. I included it in my project as an external library using project-->Build Path-->Configure Build Path-->Add Extrenal Jars in Eclipse. When I run the project on my local computer everything is fine. However on AWS I get the error,
Exception in thread "main" java.lang.NoClassDefFoundError: org/ejml/simple/SimpleBase
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.util.RunJar.main(RunJar.java:180)
Caused by: java.lang.ClassNotFoundException: org.ejml.simple.SimpleBase
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
... 3 more
I am wondering what could be going wrong. I had to rebuild the library to target Java 6 instead of 7 because hadoop on AWS only runs on Java 6. Any help/suggestions would be appreciated. Thanks
EDIT: an easy way to solve the problem in eclipse is to choose the export Runnable JAR file option while exporting the project into a JAR.
The 3rd party dependency isn't included in the job jar by default and hence the error message you are seeing. It works in Eclipse standalone mode as Eclipse knows to add the jar to the classpath at execution time.
You have two choices:
Use the -libjars
argument combined with the ToolRunner method for submitting jobs - this will ensure your 3rd party jars are submitted with your job
hadoop jar myJar.jar -libjars ejml.jar MainClass.class