Search code examples
javaapache-sparkhadoop-yarn

Spark - Call Spark jar from java with arguments


I would like to call spark jar from java (to run spark process on yarn), and try to use this link code.

It looks fit in my case, but I need to pass hashmap and some java values to spakr jar. Is it able to pass java object to spark jar?

And is java side able to know how mush spark jar process or is it done? if so, how?


Solution

  • I think you misunderstood the content given in data-algorithms

    There are 2 ways to submit job

    1) Spark-submit like below example from shell script

    cat run_secondarysorting.sh
    
    #!/bin/bash
    export JAVA_HOME=/usr/java/jdk7
    export SPARK_HOME=/home/hadoop/spark-1.1.0
    export SPARK_MASTER=spark://myserver100:7077
    BOOK_HOME=/home/mp/data-algorithms-book
    APP_JAR=$BOOK_HOME/dist/data_algorithms_book.jar
    INPUT=/home/hadoop/testspark/timeseries.txt
    # Run on a Spark standalone cluster
    prog=org.dataalgorithms.chap01.spark.SparkSecondarySort
    $SPARK_HOME/bin/spark-submit \
    --class $prog \
    --master $SPARK_MASTER \
    --executor-memory 2G \
    --total-executor-cores 20 \
    $APP_JAR
    

    2) From Yarn Client which was described in the link.

    Usage of Yarn Client is

    want to submit Spark jobs from Java code (such as Java servlets or other Java code such as REST servers).

    When you are calling this yarn client... then you need to call as method in your rest-service or servlet etc... (i.e. through web) in which you can also pass parameters like HashMap or any java object kind...

    For demo purpose he has written standalone client (with public static void main)

    Hope you understood..