Search code examples
hadoopmapreduce

map reduce output files: part-r-* and part-*


I have some questions about map reduce output part files.

    1> What are the differences between part-r-* files and part-* files in map reduce output? part-r-* is output from mapper and part-* is from reducer?
    2> If reducer doesn't produce any results, mapper output will be staying or will be deleted?

Solution

  • Normally, part-r-* comes from the reducer. MultipleOutputs allows you to use a different naming convention. If there is no reduce step, the output will be part-m-*. As I understand it, if there is a reducer defined, the mapper outputs are deleted regardless of if the reducers produce anything. Usually the reducer output files will be produced as well even if they are empty, unless you use LazyOutputFormat. Where did you find part-* files that did not end with either m-nnnnn or r-nnnnn ?