I am trying to run Spark+Scala app on Bluemix using spark-submit.sh. So far, based on documentation and source code I came up with the following snippet:
val spark: SparkSession = SparkSession
.builder
.appName("app")
.config("spark.hadoop.fs.cos.softlayer.endpoint",
"s3-api.us-geo.objectstorage.service.networklayer.com")
.config("spark.hadoop.fs.cos.softlayer.access.key",
"auto-generated-apikey-<redacted>")
.config("spark.hadoop.fs.cos.softlayer.secret.key",
"<redacted>")
.getOrCreate()
spark.sparkContext.setLogLevel("TRACE")
spark.sparkContext.textFile("s3d://<bucket>.softlayer/<file>")
which fails with
Exception in thread "Driver" java.lang.NullPointerException
at com.ibm.stocator.fs.common.ObjectStoreGlobber.glob(ObjectStoreGlobber.java:179)
at com.ibm.stocator.fs.ObjectStoreFileSystem.globStatus(ObjectStoreFileSystem.java:443)
at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:259)
due to
DEBUG apache.http.headers: http-outgoing-0 << HTTP/1.1 403 Forbidden
I believe 403 means "authentication was successful, but authorization was not", but even if I change my credentials to something random, I still get 403.
I configured my service account as a Reader for all 'cloud-object-storage' resources.
The same credentials work fine for me in python.
What am I missing?
Unfortunately, the current documentation for the AE beta refers to the IaaS version of COS, which uses AWS-style (HMAC) credentials for authentication instead of the API key provided by IBM Cloud IAM. Support for HMAC credentials in IAM-enabled COS is coming later this year.
The AE docs should be updated soon with examples of using an API key to connect to COS. In the meantime, try this configuration syntax:
.config("spark.hadoop.fs.cos.iamservice.iam.endpoint",
"https://iam.ng.bluemix.net/oidc/token")
.config("spark.hadoop.fs.cos.iamservice.endpoint",
"s3-api.us-geo.objectstorage.service.networklayer.com")
.config("spark.hadoop.fs.cos.iamservice.iam.api.key",
"<api-key>")
.config("spark.hadoop.fs.cos.iamservice.iam.service.id",
"<resource-instance-id>")