I have a directory in hdfs which gets files populated every 2 days. I want to copy all the files in this directory to another in such a way that if a new file comes in today, I want the file to be copied to the duplicate directory.
How can we do that in Hdfs.
I know we can do that in linux using rsync. Is there any method like this in Hdfs as well?
No, there are no file sync methods available with HDFS. You have to either do hdfs dfs -cp
or hadoop distcp
manually or through any scheduler (cron
).
If the number of files are more, distcp
is preferred.
hadoop distcp -update <src_dir> <dest_dir>
The -update
flag would overwrite if source and destination differ in size, blocksize, or checksum.