I am using aws with emr, and trying to change to bootstrap script in order to set the default python in pyspark to be python 3, I am following this tutorial
this is changing the /usr/lib/spark/conf/spark-env.sh file, but does not change the python version in pyspark, I am still getting jobs done with python 2.7. this is only working when I ssh to the machine and specifically use
$source /usr/lib/spark/conf/spark-env.ssh
When I try to add this line to the bootstrap script I am getting bootstrap error that the file is not found.
/bin/bash: /usr/lib/spark/conf/spark-env.sh: No such file or directory
I assume that the file does not exist in this stage. How can I set the pyspark python to be python 3 in the bootstrap script?
Add the following code to software configuration (create emr -> step1: software and steps -> edit software configuration -> enter configuration)
[
{
"Classification": "spark-env",
"Configurations": [
{
"Classification": "export",
"Properties": {
"PYSPARK_PYTHON": "/usr/bin/python3"
}
}
]
}
]