Search code examples
javahadoopmapreducehadoop2hadoop-partitioning

In a map reduce word count program need to fetch the files where the words exist


I am reading multiple input files for a word count problem.

Example file names: file1.txt file2.txt file3.txt

I am able to get the word count but what should be added if I also want to get the file names along with count where the words exist.

for an example,

Contents of file 1: welcome to Hadoop

Contents of file 2: This is hadoop

Current output :

Hadoop 2

Is 1

This 1

To 1

Welcome 1

Expected output:

Hadoop 2 File01.txt File02.txt

Is 1 File02.txt

This 1 File02.txt

To 1 File01.txt

Welcome 1 File01.txt


Solution

  • 1st do a input a split String file = ((FileSplit)inputSplit).getPath().getName(); and collect word and filename from mapper as output.

    In the reducer count the file name against the key and increment the counter and keep appending the file name.

       file += filename;
       textString = counter + file;
       output.collect(key,new Text(textString));
    

    This solved the problem.