Search code examples
javahadoopmapreduce

Text.set(string) not showing the right value in hadoop map reduce


I have a map function that sends data in the form of (the value of they keys are not important)

key: "somevalue"
value: "value \t comma separated values"

for example

key:"0"
value:"5\t1,2,3,4"

If I use this code:

Text debug;
for (Text val : values) {
    String[] segments = val.toString().split("\t");
    debug = new Text();
    debug.set(val.toString());
    context.write(key, debug);
}

I get the right output, such as

key value
0   8   1,2,4,5
0   2   0,4,5

But if I try this code, the output gets weird:

Text debug;
for (Text val : values) {
    String[] segments = val.toString().split("\t");
    debug = new Text();
    if(val.toString().split("\t").length > 1) {
        try{
            debug.set(val.toString().split("\t")[1]);
        }catch(Exception e) {
            debug.set("Exception")
        }
    }
    context.write(key, debug);
}

The expected output would be:

key  second part of value (after \t) 
1    2,3,4,5,6
1    4,5,6,6,7

However the output I get is this:

key Tab (tab character after key)
1TAB
1TAB
...
2TAB

If I replace the try...catch with if...else:

Text debug;
for (Text val : values) {
    String[] segments = val.toString().split("\t");
    debug = new Text();
    if(val.toString().split("\t").length > 1) {
        debug.set(val.toString().split("\t")[1]);
    } else {
        debug.set("only one");
    }
    context.write(key, debug);
}

This gives the result

0   only one
...
100 only one

What's going on? I checked on Java and it seems that if I call "1\t2".split("\t") it will give me ["1", "2"]


Solution

  • I found the problem, I was using it as both a combiner and a reducer. Just needed to use it only as reducer.