Search code examples
javascalaapache-sparkapache-spark-2.0spark-submit

Pass system property to spark-submit and read file from classpath or custom path


I have recently found a way to use logback instead of log4j in Apache Spark (both for local use and spark-submit). However, there is last piece missing.

The issue is that Spark tries very hard not to see logback.xml settings in its classpath. I have already found a way to load it during local execution:

What I have so far

Basically, checking for System property logback.configurationFile, but loading logback.xml from my /src/main/resources/ just in case:

// the same as default: https://logback.qos.ch/manual/configuration.html
private val LogbackLocation = Option(System.getProperty("logback.configurationFile"))
// add some default logback.xml to your /src/main/resources
private lazy val defaultLogbackConf = getClass.getResource("/logback.xml").getPath

private def getLogbackConfigPath = {
   val path = LogbackLocation.map(new File(_).getPath).getOrElse(defaultLogbackConf)
   logger.info(s"Loading logging configuration from: $path")
   path
}

And then when I initialize my SparkContext...

val sc = SparkContext.getOrCreate(conf)
sc.addFile(getLogbackConfigPath)

I can confirm it works locally.

Playing with spark-submit

spark-submit \
  ...
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

This gives an error:

Exception in thread "main" java.io.FileNotFoundException: Added file file:/path/to/my/application-fat.jar!/logback.xml does not exist

Which I think is nonsense, because first the application, finds the file (according to my code)

getClass.getResource("/logback.xml").getPath

and then, during

sc.addFile(getLogbackConfigPath)

it turns out... whoa! no file there!? What the heck!? Why would it not find the file inside the jar. It obviously is there, I did triple checked it.

Another approach to spark-submit

So I thought, OK. I will pass my file, as I could specify the system property. I put the logback.xml file next to my application-fat.jar and:

spark-submit \
  ...
  --conf spark.driver.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --conf spark.executor.extraJavaOptions="-Dlogback.configurationFile=/path/to/my/logback.xml" \
  --master yarn \
  --class com.company.Main\
  /path/to/my/application-fat.jar \
  param1 param2 

And I get the same error as above. So my setting is completely ignored! Why? How to specify

-Dlogback.configurationFile

properly and pass it as properly to driver and executors?

Thanks!


Solution

  • 1. Solving java.io.FileNotFoundException

    This is probably unsolvable.

    Simply, SparkContext.addFile can not read the file from inside the Jar. I believe it is treated as it was in some zip or alike.

    Fine.

    2. Passing -Dlogback.configurationFile

    This was not working due to my misunderstanding of the configuration parameters.

    Because I am using --master yarn parameter, but I do not specify --deploy-mode to cluster it is by default client.

    Reading https://spark.apache.org/docs/1.6.1/configuration.html#application-properties

    spark.driver.extraJavaOptions

    Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-java-options command line option or in your default properties file.

    So passing this setting with --driver-java-options worked:

    spark-submit \
      ...
      --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
      --master yarn \
      --class com.company.Main\
      /path/to/my/application-fat.jar \
      param1 param2 
    

    Note about --driver-java-options

    In contrast to --conf multiple parameters have to be passed as one parameter, example:

    --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml -Dother.setting=value" \
    

    And the following will not work

    --driver-java-options "-Dlogback.configurationFile=/path/to/my/logback.xml" \
    --driver-java-options "-Dother.setting=value" \