I'm using Hadoop 2.2.0 and in when I run my map tasks I get the following error
attempt_xxx Timed out after 1800000 seconds
(its 1800000 because I have changed the config for mapreduce.task.timeout).
Below is my map code:
public class MapTask
{
ContentOfFiles fileContent= new ContentOfFiles();
@Override
public void map(LongWritable key, Text value, Context context)
{
String line = value.toString();
String splits[] = line.split("\\t");
List<String> sourceList = Arrays.aslist(splits);
String finalOutput = fileContent.getContentOfFile(sourceList);
context.write(NullWritable.get, new Text(finalOutput));
}
}
Here is my ContentOfFiles class
public class ContentOFFiles
{
public String getContentOfFile(List<String>sourceList)
{
String returnContentOfFile;
for(List sourceList:sourceLists)
{
//Open the files and get the content and then append it to the String returnContentOfFile
}
return returnContentOfFile;
}
}
When I run my map tasks, I get the error saying
attempt_xxx Timed` out after 1800000 seconds.
What I want to know is that how can I tell hadoop that my tasks are still running.
I call the ContentOfFiles class inside my map. So is there a way of telling my map that the tasks are still running. I have tried to change the configuration mapreduce.task.timeout to 1800000 and it still gives me the same error.
Once again I'm using hadoop 2.2, so it would be great if someone can tell me how to handle this issue in the new api.
You could try to add context.progress();
after the end of each long operation in mapper. As i understand the best place for it is the end of for
cycle:
public String getContentOfFile(List < String > sourceList, Context context) {
String returnContentOfFile;
for (List sourceList: sourceLists) {
//Open the files and get the content and then append it to the String returnContentOfFile
context.progres(); // report on progress
}
return returnContentOfFile;
}