Search code examples
pythonapache-sparkibm-cloudpyspark

How do I add the Databricks spark-csv package to a Python Jupyter notebook on IBM Bluemix


I know I should add it as a package requirement when launching pyspark:

$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0

But in Bluemix, spark is already running and a spark context is already defined. How can I add this package?

On a side note, would I be able to do this in Scala?


Solution

  • Currently on Bluemix, using PySparkin a Python notebook, it is not possible to add spark-csv to the environment.

    However, you can add it in a Scala notebook using this command:

    %AddDeps com.databricks spark-csv_2.10 1.3.0 --transitive
    

    Of course, you may choose another version of the package.

    For Scala notebooks and the corresponding Spark kernel have a look at the following documentation: https://github.com/ibm-et/spark-kernel/wiki/List-of-Current-Magics-for-the-Spark-Kernel