Search code examples
amazon-web-servicesapache-sparkcluster-computingamazon-emr

spark cluster on aws emr cant find spark-env.sh


I am playing with apache-spark on aws emr, and trying to use this to set the cluster to use python3,

I use the command as the last command in a bootstrap script

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh

When I use it the cluster crashes during the bootstrap with the following error.

sed: can't read /etc/spark/conf/spark-env.sh: No such file or directory

How should I set it to use python3 properly?

This is not a duplicate of, My issue is that the cluster is not finding the spark-env.sh file while bootstrapping, while the other question addresses the issue of the system not finding python3


Solution

  • In the end I did not use that script, but Used the EMR configuration file that is available on the creation stage, It gave me the proper configurations via spark_submit (in the aws gui) If you need it to be available for pyspark scripts in a more programatic way, you can use os.environ to set the pyspark python version in the python script