Search code examples
hadoopdistcp

hadoop discp issue while copying singe file


(Note: I need to use distcp to get parallelism)

I have 2 files in /user/bhavesh folder

enter image description here

I have 1 file in /user/bhavesh1 folder

enter image description here

Copying 2 files from /user/bhavesh to /user/uday folder (This work fine)

enter image description here

enter image description here

This create /user/uday folder

Copying 1 file from /user/bhavesh1 to /user/uday1 folder if creates file instead of folder

enter image description here

enter image description here

What i need is if there is one file /user/bhavesh1/emp1.csv i need is it should create /user/uday1/emp1.csv [uday1 should form as directory] Any suggestion or help is highly appreciated.


Solution

  • In unix systems, when u copy a single file by giving destination directory name ending with /user/uday1/, destination directory will be created, however hadoop fs -cp command will fail if destination directory is missing.

    When it comes it hdfs distcp, file/dir names ending with / will be ignored if it's a single file. One workaround is to create the destination directory before executing distcp command. you may add -p option in -mkdir to avoid directory already exists error.

    hadoop fs -mkdir -p /user/uday1  ; hadoop distcp /user/bhavesh1/emp*.csv /user/uday1/  
    

    this works for both single file and multiple files in the source directory.