I have Apache Toree installed following the instructions at https://medium.com/@faizanahemad/machine-learning-with-jupyter-using-scala-spark-and-python-the-setup-62d05b0c7f56.
However I do not manage to import packages in the pySpark kernel by using the PYTHONPATH variable in the kernel file at:
/usr/local/share/jupyter/kernels/apache_toree_pyspark/kernel.json.
Using the notebook I can see the the required .zip in the sys.path and in the os.environ[‘PYTHONPATH’], and the relevant .jar is at os.environ[‘SPARK_CLASSPATH'] the but I get
“No module named graphframe” when importing it with: import graphframe.
Any suggestion on how to get graphframe imported?
Thank you.
I was using the .zip from the dataframes's download page but it does not solve the problem. The correct .zip can be created following the steps in:
https://github.com/graphframes/graphframes/issues/172
Another solution was given at: Importing PySpark packages, although the --packages parameter didn't work for me.
Hope this help.