Search code examples
hadoopcommand-linehivehdfs

Hadoop unzip files in directory and move each of them individually to another folder


I am trying to unzip hundreds of files in HDFS and move each of them individually to another folder so that it can be loaded into an external table in Hive. I tried the following command, but it only gave 1 joined file with no name in the target directory.

!hdfs dfs -cat /user/[somedir1]/* | hadoop fs -put - /user/[somedir2]/uncompressed/

enter image description here

I need (for instance) 100 compressed files to be decompressed and move each of these decompressed files to the target dir individually for debugging purposes. I can't use another programming language as it will make the project more complicated than it needs to be. I think this can be done using hdfs command line, I just don't know the right one-line syntax.


Solution

  • Found the one line solution of how to uncompress each individual file completely in Shell.

    for FILE in somedir1/*; do if [[ $FILE == *\.gzip ]] ; then newname='somedir2/'$(basename -s .gzip $FILE); zcat "${FILE}" > $newname; fi; done