Search code examples
hadoopcygwin

Java Child Error in Hadoop TaskRunner


I'm trying to run the Hadoop grep example in a pseudo-distributed configuration using Hadoop 0.22.0 on Windows 7 with Cygwin. The example works fine in standalone mode, but when run in pseudo-distributed mode it gives the following output

$ bin/hadoop jar hadoop-mapred-examples-0.22.0.jar grep input output 'dfs[a-z.]+'

12/05/15 08:27:31 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/05/15 08:27:31 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
12/05/15 08:27:31 INFO input.FileInputFormat: Total input paths to process : 1
12/05/15 08:27:32 INFO mapreduce.JobSubmitter: number of splits:1
12/05/15 08:27:33 INFO mapreduce.Job: Running job: job_201205150826_0001
12/05/15 08:27:34 INFO mapreduce.Job:  map 0% reduce 0%
12/05/15 08:27:47 INFO mapreduce.Job: Task Id : attempt_201205150826_0001_m_000002_0, Status : FAILED
java.lang.Throwable: Child Error
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:225)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
    at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:212)

12/05/15 08:27:47 WARN mapreduce.Job: Error reading task outputhttp://xxx.xxx.xxx:50060/tasklog?plaintext=true&attemptid=attempt_201205150826_0001_m_000002_0&filter=stdout

Does anyone know what could be causing the Java Child Error, or the warning about the task output cannot be read?

I get the following error in the TaskTracker log:

Failed to retrieve stdout log for task: attempt_201205151356_0001_m_000002_0
java.io.FileNotFoundException: C:\cygwin\usr\local\hadoop-0.22.0\logs\userlog\job_201205151356_0001\attempt_201205151356_0001_m_000002_0\log.index (The system cannot find the file specified)

Solution

  • not sure if this is still relevant as hadoop is now version 1.0.x.

    If it can help I've managed to port 1.0.1 on cygwin-1.7 win-7 jdk1.7_x64.

    there's so many issues at work here, which revolve around path confusion, in shell scripts and wrappers, in hadoop core java code, and the non-trivial fact that java doesn't understand cygwin symlinks.

    Here are instructions for the complete working fix:

    http://en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin

    It's also on sourceforge, though I haven't uploaded the patched code yet as I want to port the latest version (this was 1.0.1).

    http://sourceforge.net/p/win-hadoop/wiki/Hadoop-on-Cygwin/