I am setting an environment variable in my bootstrap code
export HADOOP_HOME=/home/hadoop
export HADOOP_CMD=/home/hadoop/bin/hadoop
export HADOOP_STREAMING=/home/hadoop/contrib/streaming/hadoop_streaming.jar
export JAVA_HOME=/usr/lib64/jvm/java-7-oracle/
This is followed by usage of one of the variables defined above -
$HADOOP_CMD fs -mkdir /home/hadoop/contents
$HADOOP_CMD fs -put /home/hadoop/contents/* /home/hadoop/contents/
The execution fails with the error message -
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 3: fs: command not found
/mnt/var/lib/bootstrap-actions/2/cycle0_unix.sh: line 4: fs: command not found
cycle0.sh is the name of my bootstrap script.
Any comments as to what is happening here?
I found a proper solution to my problem. My attempt to copy datafiles from S3 to EMR using hadoop fs
commands has been futile. I have just learned about S3DistCp
command available in EMR for file transfer so I am skipping the $HADOOP_CMD
method. For those who care how S3DistCp
works Link to AWS EMR Docs. I still do not understand why bootstrap script will not accept an environment variable in subsequent statements.