I am trying to run pyspark in Zeppelin and python3 (3.5) against Spark 2.1.0. I have got the pyspark shell up and running with python3 but flipping over to Zeppelin connecting to the same local cluster gives:
Exception: Python in worker has different version 3.5 than that in driver 2.7, PySpark cannot run with different minor versions
I have modified the default spark-env.sh as follows: (unmodified lines omitted for brevity)
SPARK_LOCAL_IP=127.0.0.1
SPARK_MASTER_HOST="localhost"
SPARK_MASTER_WEBUI_PORT=8080
SPARK_MASTER_PORT=7077
SPARK_DAEMON_JAVA_OPTS="-Djava.net.preferIPv4Stack=true"
export PYSPARK_PYTHON=/Library/Frameworks/Python.framework/Versions/3.5/bin/python3
export PYSPARK_DRIVER_PYTHON=/Library/Frameworks/Python.framework/Versions/3.5/bin/ipython
Staring things up ./bin/pyspark
and all is good in the shell.
Zeppelin config has been modified in zeppelin-site.xml only to move the ui port away from 8080 to 8666. `zeppelin-env.sh' has been modified as follows: (showing only mods/additions)
export MASTER=spark://127.0.0.1:7077
export SPARK_APP_NAME=my_zeppelin-mf
export PYSPARK_PYTHON=/Library/Frameworks/Python.framework/Versions/3.5/bin/python3
export PYSPARK_DRIVER_PYTHON=/Library/Frameworks/Python.framework/Versions/3.5/bin/ipython
export PYTHONPATH=/Library/Frameworks/Python.framework/Versions/3.5/bin/python3
I've tried using Anaconda but python 3.6 is currently creating issues with Spark. Also, I've used a bunch of combinations of the above config settings w/o success.
There is a setting referenced in the configs zeppelin.pyspark.python
which defaults to python
but it is unclear from the docs how/where to adjust that to python3. To help eliminate OSX specifics, I was able to replicate this failure on LinuxMint 18.1 as well.
So I've been rifling through the Zeppelin docs and the Internet trying to find the proper config setting to get Zeppelin to run as a 3.5 driver. With hope I'm missing something obvious, but I cannot seem to track this one down. Hoping someone has done this successfully and can help identify my error.
Thank you.
Naturally, something worked right after posting this...
In the Zeppelin config at ./conf/interpreter.json, for one of my notebooks I found the config:
"properties": {
...
"zeppelin.pyspark.python": "python",
...
}
Changing this to:
"properties": {
...
"zeppelin.pyspark.python": "python3",
...
}
(Combined with the same settings as above)
Has had the desired effect of getting the notebook work with python 3.5. However, this seems a bit clunky/hacky and I suspect there is a more elegant way to do this. So I won't call this a solution/answer, but more of a work around.