Search code examples
javahadoopmapreducehadoop2

Can I finish a map task early explictly from code in hadoop?


There are cases where I need not traverse every input record in a map task. For example I only emit from up to 200 records that satisfy certain conditions in each mapper then it can quit.

Can I do this in hadoop? Can't find a related method in the api doc yet.


Solution

  • You can probably achieve this by overriding the run method in the Mapper.

    The run method currently looks like:

    public void run(Context context) throws IOException, InterruptedException {
        setup(context);
        try {
            while (context.nextKeyValue()) {
                map(context.getCurrentKey(), context.getCurrentValue(), context);
            }
        } finally {
            cleanup(context);
        }
    }
    

    So this is how the standard map() method is being called. You could add a counter in there and break out of the while loop once it hits 200.