I want to run the following command:
hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input
It works when I call it from the shell after I ssh onto the master node. But it will not work if I try to call it through ssh as follows:
ssh -i /home/USER/keypair.pem [email protected] hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input
It gives the error:
zsh: command not found: hadoop
But if I take out the last pipe the command succeeds:
ssh -i /home/USER/keypair.pem [email protected] hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz"
From some searching I've found that it may be due to an error with the JAVA_HOME not being set, but it is set correctly in ~/.bashrc on the master node
The hadoop clustter is an Amazon Elastic Map Reduce cluster.
Only the first command of your piped command chain gets executed on the reomte host. The rest happens locally at your computer. So, of course, if you don't have hadoop installed, zsh will print out an error message (and otherwise, it would just put it onto your local Hadoop, which is probably not what you want.
To pass all commands to ssh, you can put them in quotes "" or single quotes '':
ssh -i /home/USER/keypair.pem [email protected] 'hadoop fs -ls hdfs:///logs/ | grep -oh "/[^/]*.gz" | grep -oh "[^/]*.gz" | hadoop fs -put - hdfs:///unzip_input/input'