Search code examples
rscalaapache-sparksparklyr

sparklyr hadoopConfiguration


I apologize that this question will be hard to make fully reproducible because it involves a running spark context (referenced to as sc below), but I am trying to set a hadoopConfiguration in sparklyr, specifically for accessing swift/objectStore objects from RStudio sparklyr as a Spark object, but in general for a scala call to hadoopConfiguration. Something like (scala code):

sc.hadoopConfiguration.set(f"fs.swift.service.$name.auth.url","https://identity.open.softlayer.com"/v3/auth/tokens")

where sc is a running spark context. In SparkR I can run (R code)

hConf = SparkR:::callJMethod(sc, "hadoopConfiguration") 
SparkR:::callJMethod(hConf, "set", paste("fs.swift.service.keystone.auth.url"), paste("https://identity.open.softlayer.com/v3/auth/tokens",sep=""))

in sparklyr I have tried every incantation of this that I think of, but my best guess is (again R code)

sc %>% invoke("set", paste("fs.swift.service.keystone,auth.url"), paste("https://identity.open.softlayer.com/v3/auth/tokens",sep=""))

but this results in the non-verbose error (and irregular spelling) of

Error in enc2utf8(value) : argumemt is not a character vector

of course I tried to encode the inputs in every way that I can think of (naturally enc2utf8(value) being the first, but many others including lists and as.character(as.list(...)) which appears to be a favorite for sparklyr coders). Any suggestions would be greatly appreciated. I have combed the source code for sparklyr and cannot find any mentions of hadoopConfiguration in the sparklyr github, so I am afraid that I missing something very basic in the core configuration. I have also tried to pass these configs in the config.yml in the spark_connect() core call, but while this is working in setting the "fs.swift.service.keystone.auth.url" as a sc$config$s.swift.service.keystone.auth.url setting, it is apparently failing to set these as a core hadoopConfiguration.

By the way, I am using Spark1.6, scala 2.10, R 3.2.1, and sparklyr_0.4.19.


Solution

  • I figured this out

    set_swift_config <- function(sc){
      #get spark_context
      ctx <- spark_context(sc)
    
      #set the java spark context
      jsc <- invoke_static(
        sc,
        "org.apache.spark.api.java.JavaSparkContext",
        "fromSparkContext",
        ctx
      )
    
      #set the swift configs:
      hconf <- jsc %>% invoke("hadoopConfiguration")
      hconf %>% invoke("set","fs.swift.service.keystone.auth.url",
                       "https://identity.open.softlayer.com/v3/auth/tokens" )
    }
    

    which can be run with set_swift_config(sc).