Search code examples
pythonapache-sparkhadoop-yarn

Run a python spark job in yarn-cluster mode


Here comes a problem when I use spark run pi.py script in example of spark python, when I use yarn-client mode, everything works fine. But when I use yarn-cluster mode, the job can't start, and the container return a syntax error like this:

LogType:stdout

Log Upload Time:Thu May 21 08:48:16 +0800 2015

LogLength:111

Log Contents:

File "pi.py", line 40

return 1 if x ** 2 + y ** 2 < 1 else 0

I'm sure the script is right, Can anybody help me out.


Solution

  • Noticed that syntax error is feature included in new version of Python, so I thought that maybe this is a problem about the Python version which Spark is using.

    I added a property in

    /etc/spark/conf.cloudera.spark_on_yarn/spark-defaults.conf:
    spark.yarn.appMasterEnv.PYSPARK_PYTHON
    

    to specify the Python binary path.