Search code examples
pythonapache-sparkipythonibm-cloudjupyter

How can I reference libraries for ApacheSpark using IPython Notebook only?


I'm currently playing around with the Apache Spark Service in IBM Bluemix. There is a quick start composite application (Boilerplate) consisting of the Spark Service itself, an OpenStack Swift service for the data and an IPython/Jupyter Notebook.

I want to add some 3rd party libraries to the system and I'm wondering how this could be achieved. Using an python import statement doesn't really help since the libraries are then expected to be located on the SparkWorker nodes.

Is there a ways of loading python libraries in Spark from an external source during job runtime (e.g. a Swift or ftp source)?

thanks a lot!


Solution

  • You cannot add 3rd party libraries at this point in the beta. This will most certainly be coming later in the beta as it's a popular requirement ;-)