The following simple script works fine in pyspark when it is ran from the terminal:
import pyspark
sc = pyspark.SparkContext()
foo = sc.parallelize([1,2])
But when ran in Rodeo, it produces an error, most important line of which says:
Exception: Python in worker has different version 2.7 than that in driver 3.5, PySpark cannot run with different minor versions
And the full error output can be found at this link:
file contains the following lines:
export PYSPARK_PYTHON=python3
The problem persists despite that and putting the same lines in ~/.bashrc
doesn't solve the problem, either.
Rodeo version: 1.3.0
Spark version: 1.6.1
Platform: Linux
This issue is related to one described here: link
Rodeo as a desktop app has a hard time working with shell environment variables. The trick is to put variables we'd normally declare in in Rodeo's .rodeoprofile instead using os module to add them. Specifically in this case adding the following lines to .rodeoprofile helped:
(though the second one is redundant and I added it just for consistence as the driver used 3.5 anyway)