A common approach for connecting to third party systems from spark is to provide the credentials for the systems as arguments to the spark script. However, this raises some questions about security. E.g. See this question Bluemix spark-submit -- How to secure credentials needed by my Scala jar
Is it possible for a spark job running on bluemix to see a list of the other processes on the operating system? I.e. Can a job run the equivalent of ps -awx
to inspect the processes running on the spark cluster and the arguments that were passed to those processes? I'm guessing that it was a design goal that this must not be possible, but it would be good to verify this.
For the Bluemix Apache Spark service, each provisioned spark service instance is a tenant. Each tenant is isolated from all other tenants. Spark jobs of a given tenant cannot access files or memory of any other tenant. So even if you could ascertain, say, the id of another tenant through process lists, you could not exploit that; and nothing truly private should be in any such argument. A relevant analogy here is that/etc/passwd
is world readable, but the knowledge of a user id does not, in and of itself, open any doors. i.e. it is not security by obscurity; actual things are locked down.
Given all this, I understand that this service will further isolate through containerization sometime in the near future.