Search code examples
pysparkapache-zeppelin

Setting Specific Python in Zeppelin Interpreter


What do I need to do beyond setting "zeppelin.pyspark.python" to make a Zeppelin interpreter us a specific Python executable?

Background:

I'm using Apache Zeppelin connected to a Spark+Mesos cluster. The cluster's worked fine for several years. Zeppelin is new and works fine in general.

But I'm unable to import numpy within functions applied to an RDD in pyspark. When I use Python subprocess to locate the Python executable, it shows that the code is being run in the system's Python, not in the virutalenv it needs to be in.

So I've seen a few questions on this issue that say the fix is to set "zeppelin.pyspark.python" to point to the correct python. I've done that and restarted the interpreter a few times. But it is still using the system Python.

Is there something additional I need to do? This is using Zeppelin 0.7.


Solution

  • On an older, custom snapshot build of Zeppelin I've been using on an EMR cluster, I set the following two properties to use a specific virtualenv:

    "zeppelin.pyspark.python": "/path/to/bin/python",
    "spark.executorEnv.PYSPARK_PYTHON": "/path/to/bin/python"