Search code examples
scalaapache-sparkspark-shell

Is it possible to run a Spark Scala script without going inside spark-shell?


The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?


Solution

  • You can simply use the stdin redirection with spark-shell:

    spark-shell < YourSparkCode.scala
    

    This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.

    Another option is to use -I <file> option of spark-shell command:

    spark-shell -I YourSparkCode.scala
    

    The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.

    [UDP] Passing parameters

    Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.

    Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).

    But I personally find the Spark configuration the most clean and convenient way.

    Your pass your parameters via --conf option:

    spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala
    

    (please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)

    And read these arguments in your Spark code as below:

    val arg1: String = spark.conf.get("spark.myscript.arg1")
    val arg2: String = spark.conf.get("spark.myscript.arg2")