My Mapper output:
1504652886 Geography
8904209587 Science
8904209587 Math
9341024668 English9
9341024668 Science
I am trying to write a reducer class now that will combine the common keys and generate an output as shown below:
1504652886 Geography
8904209587 Science, Math
9341024668 English9, Science
In the reducer class, I tried to make an arraylist that will contain all courses for a particular ID but I am surely doing something wrong. My code is as below:
public static class Reduce extends Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
ArrayList<String> courses = new ArrayList<String>();
for(Text x : values)
{
courses.add((Text) x);
}
}
}
But I am missing something and getting error:
The method add(String) in the type ArrayList<String> is not applicable for the arguments (Text)
Can anybody please advise how to get the output?
Hadoop's Text
class has a toString() method that returns a String representation of this object. So could just replace the following in your code:
for(Text x : values)
{
courses.add(x.toString());
}
Then, you will need to convert the ArrayList back to Text, however, in order to emit it as a key.
You are also missing the write()
method, which actually emits the output.
You could use a StringBuilder instead, which should be faster than using an ArrayList:
public static class Reduce extends Reducer<Text, Text, Text, Text> {
Text valueToEmit = new Text();
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
StringBuilder sb = new StringBuilder();
for(Text x : values)
{
sb.append(x.toString()).append(",");
}
valueToEmit.set(sb.substring(0,sb.length()-1)); //to remove the last ','
context.write(key, valueToEmit);
}
}