Search code examples
hadoophdfsdata-transfer

Transfer file out from HDFS


I want to transfer files out from HDFS to local filesystem of a different server which is not in hadoop cluster but in the network.

I could have done:

hadoop fs -copyToLocal <src> <dest>
and then scp/ftp <toMyFileServer>.

As the data is huge and due to limited space on local filesystem of hadoop gateway machine, I wanted to avoid this and sent data directly to my file server.

Please help with some pointers on how to handle this issue.


Solution

  • So you probably have a file with a bunch of parts as the output from your hadoop program.

    part-r-00000
    part-r-00001
    part-r-00002
    part-r-00003
    part-r-00004
    

    So lets do one part at a time?

    for i in `seq 0 4`;
    do
    hadoop fs -copyToLocal output/part-r-0000$i ./
    scp ./part-r-0000$i you@somewhere:/home/you/
    rm ./part-r-0000$i
    done
    

    You may have to lookup the password modifier for scp