So, I connect to my EMR cluster's master node using SSH. This is the file structure present in the master node:
|-- AnalysisRunner.scala
|-- AutomatedConstraints.scala
|-- deequ-1.0.1.jar
|-- new
| |-- Auto.scala
| `-- Veri.scala
|-- VerificationConstraints.scala
`-- wget-log
Now, I would first run
spark-shell --conf spark.jars=deequ-1.0.1.jar
And once I got to the scala prompt, I would use :load new/Auto.scala
to run my scala script.
WHAT I WANT TO DO:
While on my EMR cluster's master node, I would like to run a single spark-submit
command that would help me achieve exactly what I was doing earlier.
I'm new to this, so can anyone help me with the command?
For any beginner who might be stuck here:
You will need to have an IDE (I used IntelliJ IDEA). Steps to follow:
spark-submit --class pkg.obj
--jars <path to your dependencies (if any)>
<path to the jar created from your code>
<command line arguments (if any)>
This worked for me. Note - if you are running this on an EMR cluster, make sure all paths are specified based on either