Search code examples
scalaapache-sparkspark-submitapache-spark-2.2

Spark History Server - Identify log file that a job writes to


I want to use the Spark History Server API(http://127.0.0.1:18080/api/v1/applications/) to identify the log file in /tmp/spark-events/ that certain jobs write to. I can see that the job ID is the same as the log file name so was thinking if I had a unique job name I could look for that and get the associated ID. My problem here is I have a scala application which sets the application name in the code:

val conf = new SparkConf()
  .setAppName(s"TeraGen ($size)")

Each time the job is run it has the same name. Is it possible to override the application name in the command line? I tried passing --name but that doesn't work.

Failing that, is there a better way to do this?


Solution

  • I passed a uuId as an arg by adding the following to my code and assigning it to a variable:

    val uuId = args(2)
    

    I then added it to the application name with:

    val conf = new SparkConf()
      .setAppName(s"TeraGen ($size) $uuId")