Search code examples
hadoopamazon-web-servicesenvironment-variablesbootstrappingemr

Environment variables set in bootstrap does not take effect in AWS EMR


I am setting an environment variable in my bootstrap code

export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/

This is followed by usage of one of the variables defined above -

$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/

The execution fails with the error message -

/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found

cycle0.sh is the name of my bootstrap script.

Any comments as to what is happening here?


Solution

  • I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs commands has been futile. I have just learned about S3DistCp command available in EMR for file transfer so I am skipping the $HADOOP_CMD method. For those who care how S3DistCp works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.