Search code examples
hadoopmapreducehdfshar

MapReduce Job with HAR file input


I have created a HAR file containing multiple small input files. For running a map reduce job with a single input file, this willbe the command:

hadoop jar <jarname> <packagename.classname> <input> <output>

But if in case the above <input> is a HAR file the what will be the command such that all the contents of the HAR file is considered as input?


Solution

  • If the input is a HAR file then in the place of input the following has to be given

    har:///hdfs path to har file
    

    Since hadoop archives will be exposed as filesystem, mapreduce will be able to use all the files in hadoop archives as input.