I have some problems to configure hadoop with sparkR in order to read/write data from amazon S3.
For example these are the commands that works in pyspark (to solve the same issue):
sc._jsc.hadoopConfiguration().set("fs.s3n.impl","org.apache.hadoop.fs.s3native.NativeS3FileSystem")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsAccessKeyId", "myaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.awsSecretAccessKey", "mysecretaccesskey")
sc._jsc.hadoopConfiguration().set("fs.s3n.endpoint", "myentrypoint")
Could anybody help me to work this out?
A solution closer to what you are doing with PySpark can be achieved by using callJMethod
(https://github.com/apache/spark/blob/master/R/pkg/R/backend.R#L31)
> hConf = SparkR:::callJMethod(sc, "hadoopConfiguration")
> SparkR:::callJMethod(hConf, "set", "a", "b")
NULL
> SparkR:::callJMethod(hConf, "get", "a")
[1] "b"
UPDATE:
hadoopConfiguration
didn't work for me: conf
worked though - presumably it's changed at some point.