Search code examples
python-2.7apache-sparkapache-spark-1.4

Python versions in worker node and master node vary


Running spark 1.4.1 on CentOS 6.7. Have both python 2.7 and python 3.5.1 installed on it with anaconda.

MAde sure that PYSPARK_PYTHON env var is set to python3.5 but when I open pyspark shell and execute a simple rdd transformation, it errors out with below exception:

Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions

Just wondering what are the other places to change the path.


Solution

  • Did you restart the Spark workers with the new setting? Changing the environment setting just for your driver process is not enough: tasks created by the driver will cross process, sometimes system, boundaries to be executed. Those tasks are compiled bits of code, so that is why both versions need to match.