I know I should add it as a package requirement when launching pyspark:
$SPARK_HOME/bin/spark-shell --packages com.databricks:spark-csv_2.11:1.4.0
But in Bluemix, spark is already running and a spark context is already defined. How can I add this package?
On a side note, would I be able to do this in Scala?
Currently on Bluemix, using PySpark
in a Python notebook, it is not possible to add spark-csv
to the environment.
However, you can add it in a Scala notebook using this command:
%AddDeps com.databricks spark-csv_2.10 1.3.0 --transitive
Of course, you may choose another version of the package.
For Scala notebooks and the corresponding Spark kernel have a look at the following documentation: https://github.com/ibm-et/spark-kernel/wiki/List-of-Current-Magics-for-the-Spark-Kernel