Doing a quick test of the form
testfunc() {
hadoop fs -rm /test001.txt
hadoop fs -touchz /test001.txt
hadoop fs -setfattr -n trusted.testfield -v $(date +"%T") /test001.txt
hadoop fs -mv /test001.txt /tmp/.
hadoop fs -getfattr -d /tmp/test001.txt
}
testfunc()
testfunc()
resulting in output
... during second function call
mv: '/tmp/test001.txt': File exists
# file: /tmp/test001.txt
trusted.testfield="<old timestamp from first call>"
...
it seems like (unlike in linux) the hadoop fs mv
command does not overwrite a destination file if already exists. Is there a way to force overwrite behavior (I suppose I could check and delete the destination each time, but something like hadoop mv -overwrite <source> <dest>
would be more convenient for my purposes)?
** By the way if, I am interpreting the results incorrectly or the behavior just seems incorrect, let me know (as I had assumed that overwriting was the default behavior and am writing this question because I was surprised that it seemed not to be).
I think there is no straight option to move and overwrite files from one HDFS location to other although copying (cp command) has the option to force (using -f). From Apache Hadoop documentation (https://hadoop.apache.org/docs/r1.2.1/hdfs_design.html), it is said that Hadoop is designed to use write-once-read-many model which limited overwriting.