Search code examples
hadoophdfsarchivehadoop-archivebigdata

Archiving incoming small hdfs files


I have small files coming into hdfs everyday. I am planning to use hadoop archive (HAR) but how can I archive these small files that comes into hdfs everyday. Eg: I might get 5 files today I need to archive them and tomorrow if I get 5 more files I need to append this into the previous days archive.


Solution

  • You cannot add files to the existing HAR files. You need to un-archive and re-archive or pool files for some days and create archive files moving forward.