Search code examples
hadoopapache-sparkmaprspark-submit

spark Yarn mode how to get applicationId from spark-submit


When I submit spark job using spark-submit with master yarn and deploy-mode cluster, it doesn't print/return any applicationId and once job is completed I have to manually check MapReduce jobHistory or spark HistoryServer to get the job details.
My cluster is used by many users and it takes lot of time to spot my job in jobHistory/HistoryServer.

is there any way to configure spark-submit to return the applicationId?

Note: I found many similar questions but their solutions retrieve applicationId within the driver code using sparkcontext.applicationId and in case of master yarn and deploy-mode cluster the driver also run as a part of mapreduce job, any logs or sysout printed to remote host log.


Solution

  • Here are the approaches that I used to achieve this:

    1. Save the application Id to HDFS file. (Suggested by @zhangtong in comment).
    2. Send an email alert with applictionId from driver.