I'm trying to run the Hadoop grep example in a pseudo-distributed configuration using Hadoop 0.22.0 on Windows 7 with Cygwin. The example works fine in standalone mode, but when run in pseudo-distributed mode it gives the following output
$ bin/hadoop jar hadoop-mapred-examples-0.22.0.jar grep input output 'dfs[a-z.]+'
12/05/15 08:27:31 WARN conf.Configuration: mapred.used.genericoptionsparser is deprecated. Instead, use mapreduce.client.genericoptionsparser.used
12/05/15 08:27:31 WARN mapreduce.JobSubmitter: No job jar file set. User classes may not be found. See Job or Job#setJar(String).
12/05/15 08:27:31 INFO input.FileInputFormat: Total input paths to process : 1
12/05/15 08:27:32 INFO mapreduce.JobSubmitter: number of splits:1
12/05/15 08:27:33 INFO mapreduce.Job: Running job: job_201205150826_0001
12/05/15 08:27:34 INFO mapreduce.Job: map 0% reduce 0%
12/05/15 08:27:47 INFO mapreduce.Job: Task Id : attempt_201205150826_0001_m_000002_0, Status : FAILED
java.lang.Throwable: Child Error
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:225)
Caused by: java.io.IOException: Task process exit with nonzero status of 1.
at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:212)
12/05/15 08:27:47 WARN mapreduce.Job: Error reading task outputhttp://xxx.xxx.xxx:50060/tasklog?plaintext=true&attemptid=attempt_201205150826_0001_m_000002_0&filter=stdout
Does anyone know what could be causing the Java Child Error, or the warning about the task output cannot be read?
I get the following error in the TaskTracker log:
Failed to retrieve stdout log for task: attempt_201205151356_0001_m_000002_0
java.io.FileNotFoundException: C:\cygwin\usr\local\hadoop-0.22.0\logs\userlog\job_201205151356_0001\attempt_201205151356_0001_m_000002_0\log.index (The system cannot find the file specified)
not sure if this is still relevant as hadoop is now version 1.0.x.
If it can help I've managed to port 1.0.1 on cygwin-1.7 win-7 jdk1.7_x64.
there's so many issues at work here, which revolve around path confusion, in shell scripts and wrappers, in hadoop core java code, and the non-trivial fact that java doesn't understand cygwin symlinks.
Here are instructions for the complete working fix:
http://en.wikisource.org/wiki/User:Fkorning/Code/Hadoop-on-Cygwin
It's also on sourceforge, though I haven't uploaded the patched code yet as I want to port the latest version (this was 1.0.1).