Search code examples
apache-sparkintellij-ideahadoop-yarn

Standalone Spark application in IntelliJ


I am trying to run a spark application (written in Scala) on a local server for debug. It seems that YARN is the default in the version of spark (2.2.1) that I have in the sbt build definitions, and according to an error I'm consistently getting, there is no spark/YARN server listening:

Client:920 - Failed to connect to server: 0.0.0.0/0.0.0.0:8032: retries get failed due to exceeded maximum allowed retries number

According to netstat indeed there is really no port 8032 on my local server, in listening state.

How would I typically accomplish running my spark application locally, in a way bypassing this problem? I only need the application to process a small amount of data for debug, and hence would like to be able to run locally, without reliance on specific SPARK/YARN installations and setups on the local server ― that would be an ideal debug setup.

Is that possible?

My sbt definitions already bring in all the necessary spark and spark.yarn jars. The problem also reproduces when running the same project in sbt, outside of IntelliJ.


Solution

  • You could submit spark application in local mode with .master("local[*]") if you have to test pipeline with miniscule data.

    Full code:

    val spark = SparkSession
      .builder
      .appName("myapp")
      .master("local[*]")
      .getOrCreate()
    

    For spark-submit use --master local[*] as one of the arguments. Refer this: https://spark.apache.org/docs/latest/submitting-applications.html

    Note: Do not hard code master in your codebase, always try to supply these variables from commandline. This makes application reusable for local/test/mesos/kubernetes/yarn/whatever.