There are cases where I need not traverse every input record in a map task. For example I only emit from up to 200 records that satisfy certain conditions in each mapper then it can quit.
Can I do this in hadoop? Can't find a related method in the api doc yet.
You can probably achieve this by overriding the run
method in the Mapper.
The run method currently looks like:
public void run(Context context) throws IOException, InterruptedException {
setup(context);
try {
while (context.nextKeyValue()) {
map(context.getCurrentKey(), context.getCurrentValue(), context);
}
} finally {
cleanup(context);
}
}
So this is how the standard map()
method is being called. You could add a counter in there and break out of the while loop once it hits 200.