Search code examples
javaapache-sparkconfigurationamazon-emr

How to read app.properties file from Java Spark application


I implemented Java Spark application, which I'm running on EMR cluster with spark-submit command. I want to pass app.properties which I use in my application. app.properties looks as follows:

local_fetcher = false
local_storage = false
local_db = true
.
.
.

I want to be able to get this data in my application. My questions are:

  1. Where app.properties should be located?
  2. How can I read it content in my Spark application?
  3. Should I be able to read it from driver & executers?

I tried to use --properties-file flag but I understood it will override the default Spark configuration which is not what I want. I saw that I might use --file flag, but didn't understand where should the file be located and how I can read it inside my application.


Solution

  • First option: --files

    --files FILES Comma-separated list of files to be placed in the working directory of each executor. File paths of these files in executors can be accessed via SparkFiles.get(fileName).

    spark-submit --files /path/to/app.properties /path/to/your/fat/jar.jar
    

    You can get the exact location of the uploaded file using the SparkFiles.

    Second option: getResourceAsStream

    Put your app.properties inside your job's JAR file, and load it like this:

    val appPropertiesStream = scala.io.Source.fromInputStream(
      classOf[yourObject].getClassLoader.getResourceAsStream("/app.properties")
    
    val appPropertiesString = scala.io.Source.fromInputStream(appPropertiesStream ).mkString
    

    (note the forward slash before the "app.properties", as far as I remember it's important)