Search code examples
apache-sparkhadoop-yarn

spark on yarn run double times when error


I use the model that spark on yarn,when i meet a problem the spark would restart automatic.

I want to run exact once whatever successful or fail.

Is there any conf or api can set?

I'm using spark version 1.5.


Solution

  • You have to set spark.yarn.maxAppAttempts property to 1. Default value for this is yarn.resourcemanager.am.max-attempts which is by default 2.

    Set the property via code:

    SparkConf conf = new SparkConf();
    conf.set("spark.yarn.maxAppAttempts", "1");
    

    Set when submitting the job via spark-submit:

    --conf spark.yarn.maxAppAttempts=1