Search code examples
rapache-sparkhadoop-yarnsparkr

How to set a YARN queue from within a SparkR session?


If I'm initializing a Spark session using SparkR (not spark-submit), like this...

library(SparkR, lib.loc = c(file.path(Sys.getenv("SPARK_HOME"), "R", "lib")))
sparkR.session()

is there a way to set a queue? I tried something like this:

sparkR.session(queue = "queue_name") 

but it didn't seem to work. The only way I've successfully used a queue from within SparkR is with the deprecated init() function:

sc <- SparkR::sparkR.init(master = "yarn-client", sparkEnvir = list(spark.yarn.queue="queue-name")) 
hiveContext <- sparkRHive.init(sc)

But that throws warnings: 'SparkR::sparkR.init' is deprecated.

How does this translate to sparkR.session()?


Solution

  • When starting spark R, the Spark Session is already generated. You need to stop the current session and spin up a new one to set the desired settings.

    I use the following

    sparkR.stop()
    sparkR.session(
        # master="local[2]",              # local master
        master="yarn",                    # cluster master
        appName="my_sparkR",
        sparkConfig=list(
            spark.driver.memory="4g",
            spark.executor.memory="2g",
            spark.yarn.queue="your_desired_queue"
        )
    )
    

    Verify from the Spark monitoring page that the settings updated correctly.