To start things off I created a jar file using this How to build jars from IntelliJ properly?.
My Jar files path is
out/artifacts/sparkProgram_jar/sparkProgram.jar
My spark program, in general, reads a table from MongoDB, transforms it using spark's mllib and writes it to MySQL. Here is my build.sbt file.
name := "sparkProgram"
version := "0.1"
scalaVersion := "2.12.4"
val sparkVersion = "3.0.0"
val postgresVersion = "42.2.2"
resolvers ++= Seq(
"bintray-spark-packages" at "https://dl.bintray.com/spark-packages/maven",
"Typesafe Simple Repository" at "https://repo.typesafe.com/typesafe/simple/maven-releases",
"MavenRepository" at "https://mvnrepository.com"
)
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % sparkVersion,
"org.apache.spark" %% "spark-sql" % sparkVersion,
"org.apache.spark" %% "spark-mllib" % sparkVersion,
// logging
"org.apache.logging.log4j" % "log4j-api" % "2.4.1",
"org.apache.logging.log4j" % "log4j-core" % "2.4.1",
"org.mongodb.spark" %% "mongo-spark-connector" % "2.4.1",
//"mysql" % "mysql-connector-java" % "5.1.12",
"mysql" % "mysql-connector-java" % "8.0.18"
).
My main class is in the package com.testing in a scala object named
mainObject
When I run the following spark-submit command
spark-submit --master local --class com.testing.mainObject
--packages mysql:mysql-connector-java:8.0.18,org.mongodb.spark:mongo-spark-connector_2.12:2.4.1 out/artifacts/sparkProgram_jar/sparkProgram.jar
I receive this error
Error: Missing application resource.
Usage: spark-submit [options] <app jar | python file | R file> [app arguments]
Usage: spark-submit --kill [submission ID] --master [spark://...]
Usage: spark-submit --status [submission ID] --master [spark://...]
Usage: spark-submit run-example [options] example-class [example args]
Options:
... zsh: command not found: --packages
And then when I attempt to run my spark-submit without the --packages(just to check what would happen) I receive this error.
command:
spark-submit --master local --class com.testing.mainObject out/artifacts/sparkProgram_jar/sparkProgram.jar
error: Error: Failed to load class com.testing.mainObject
I've used spark-submit before and it worked ( a couple months back). I'm not sure why this is still giving me an error. My MANIFEST.MF is the following
Manifest-Version: 1.0
Main-Class: com.testing.mainObject
My answer so far, was to first build the jar file differently.(IntelliJ creation)
File -> Project Structure -> Project Settings -> Artifacts -> Jar
,
however instead of extracting to jar, I clicked on
Copy to Output and link to manifest
From there, I did a spark-submit command which did not have --packages part of it. It was
spark-submit --class com.testing.mainObject --master local out/artifacts/sparkProgram_jar/sparkProgram.jar
Also be careful about spacing, and copying and pasting into your terminal. White space can give you weird errors.
From there I had another error, which is shown here. https://github.com/Intel-bigdata/HiBench/issues/466. The solution is in the comments
"This seems to happen with hadoop 3. I solved it removing a hadoop-hdfs-2.4.0.jar that was in the classpath."