Search code examples
hadoophadoop2

Overwrite destination with hadoop fs mv?


Doing a quick test of the form

testfunc() {
    hadoop fs -rm /test001.txt
    hadoop fs -touchz /test001.txt
    hadoop fs -setfattr -n trusted.testfield -v $(date +"%T") /test001.txt
    hadoop fs -mv /test001.txt /tmp/.
    hadoop fs -getfattr -d /tmp/test001.txt
}
testfunc()
testfunc()

resulting in output

... during second function call
mv: '/tmp/test001.txt': File exists
# file: /tmp/test001.txt
trusted.testfield="<old timestamp from first call>"
...

it seems like (unlike in linux) the hadoop fs mv command does not overwrite a destination file if already exists. Is there a way to force overwrite behavior (I suppose I could check and delete the destination each time, but something like hadoop mv -overwrite <source> <dest> would be more convenient for my purposes)?

** By the way if, I am interpreting the results incorrectly or the behavior just seems incorrect, let me know (as I had assumed that overwriting was the default behavior and am writing this question because I was surprised that it seemed not to be).


Solution

  • I think there is no straight option to move and overwrite files from one HDFS location to other although copying (cp command) has the option to force (using -f). From Apache Hadoop documentation (https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html), it is said that Hadoop is designed to use write-once-read-many model which limited overwriting.