In ami 3 the file /home/hadoop/conf/hadoop-user-env.sh
existed. And this legacy code I'm looking at was able to run this command in bootstrapping.
echo ". /home/hadoop/resources/pips/bin/activate" >> /home/hadoop/conf/hadoop-user-env.sh
This activates virtual env for Python.
In ami 4 this file is gone. How am I suppose to get a python step in Hadoop to run in virtual env under ami 4?
Going to give this a shot and hope it helps you.
In Amazon EMR AMI versions 2.x and 3.x, there was a hadoop-user-env.sh script which was not part of standard Hadoop and was used along with the configure-daemons bootstrap action to configure the Hadoop environment. The script included the following actions:
#!/bin/bash
export HADOOP_USER_CLASSPATH_FIRST=true;
echo "HADOOP_CLASSPATH=/path/to/my.jar" >> /home/hadoop/conf/hadoop-user-env.sh
In Amazon EMR release 4.x, you can do the same now with the hadoop-env configurations:
[
{
"Classification":"hadoop-env",
"Properties":{
},
"Configurations":[
{
"Classification":"export",
"Properties":{
"HADOOP_USER_CLASSPATH_FIRST":"true",
"HADOOP_CLASSPATH":"/path/to/my.jar"
}
}
]
}
]
There is more info about the differences and replacement codes on Amazon's Documentation Site.