I am trying to use the graphframes library on Apache Zeppelin with the Spark (pyspark) interpreter, however, I keep on getting the error:
ModuleNotFoundError: No module named 'graphframes'
whenever I try to import the graphframes module using from graphframes import *
.
I have tried adding the --packages 'graphframes:graphframes:0.7.0-spark2.4-s_2.11'
directive in the zeppelin-env.sh file, I tried using the z.load('graphframes:graphframes:0.7.0-spark2.4-s_2.11')
function, and I tried adding graphframes as a dependency in the interpreter setting, however, none of these attempts have worked.
I have also tried adding a spark repository to Zeppelin and then adding the maven coordinates for graphframes to the interpreter on zeppelin under the dependencies section. However, this did not work either.
I am using spark version 2.4 with scala 2.11 on zeppelin 0.8.1 hosted on an EMR cluster.
I am able to use graphframes from the terminal using pyspark and the --packages directive mentioned above, so this seems to be a zeppelin related issue.
I am stumped as to what I might do further. Any ideas on how I can get graphframes to work on zeppelin?
I think the problem is the your PYTHONPATH in Zeppelin. You can see the PYTHONPATH with:
import sys
print(sys.path)
It works with the pyspark console because the package will be installed in a location which is already part of the PYTHONPATH. You can cheack that with:
import graphframes
print(graphframes.__file__)
So all you have to do is to ad the package to your PYTHONPATH. Add the following line to
/etc/spark/conf/spark-defaults.conf
(other ways like the --packages parameter as SPARK_SUBMIT_OPTIONS should work as well):
spark.jars.packages graphframes:graphframes:0.7.0-spark2.4-s_2.11
After that you should add to /etc/spark/conf/spark-env.sh
the following line to extend your PYTHONPATH (check the package location):
export PYTHONPATH=$PYTHONPATH:/var/lib/zeppelin/.ivy2/jars/graphframes_graphframes-0.7.0-spark2.4-s_2.11.jar
Restart your the spark interpreter in zeppelin to make sure that all changes are applied.