Search code examples
javahadoopsshpipedd

Can't pipe Output of Hadoop Command


I want to run the following command:

 hadoop fs -copyToLocal FILE_IN_HDFS | ssh REMOTE_HOST "dd of=TARGET_FILE"

However, when I try, all it does, is create an empty file on the target host and copy it to my local home drive, instead of copying it to the remote location.

$ hadoop fs -copyToLocal FILE_IN_HDFS | ssh REMOTE_HOST "dd of=test.jar"

0+0 Datensätze ein

0+0 Datensätze aus

0 Bytes (0 B) kopiert, 1,10011 s, 0,0 kB/s

I cannot think of any reason, why this command would behave in this way. Is this some Java-ism that I'm missing here, or am I actually doing it wrong?


Solution

  • The -copyToLocal option expects 2 arguments: the file in HDFS, and the local path. I'm not even seeing how this can copy to your local drive, this command fails for me.

    But I think the actual issue is different: the -copyToLocal option doesn't print anything on stdout that can be piped to the ssh command. Here you're essentially piping an empty stream to dd, so there is nothing to create.

    I would do the following command which seems to work:

    hadoop fs -cat $FILE_IN_HDFS | ssh $REMOTE_HOST "dd of=$TARGET_FILE"
    

    Like this you are piping a stream which is the content of your file and copying it to the file pointed to by $TARGET_FILE. Tested on my box and this works fine.

    This avoids the need to copy the file locally and then scp'ing the file to the remote box, everything is streamed, which is I believe what you are looking for.