Here comes a problem when I use spark run pi.py script in example of spark python, when I use yarn-client mode, everything works fine. But when I use yarn-cluster mode, the job can't start, and the container return a syntax error like this:
LogType:stdout
Log Upload Time:Thu May 21 08:48:16 +0800 2015
LogLength:111
Log Contents:
File "pi.py", line 40
return 1 if x ** 2 + y ** 2 < 1 else 0
I'm sure the script is right, Can anybody help me out.
Noticed that syntax error is feature included in new version of Python, so I thought that maybe this is a problem about the Python version which Spark is using.
I added a property in
/etc/spark/conf.cloudera.spark_on_yarn/spark-defaults.conf:
spark.yarn.appMasterEnv.PYSPARK_PYTHON
to specify the Python binary path.