Search code examples

Hadoop Pipes Wordcount example: NullPointerException in LocalJobRunner

I am trying to run the sample example in this tutorial about Hadoop Pipes:

I'm succeeding in compiling and everything. However, after it runs it shows me a NullPointerException error. I tried many ways and read many similar questions, but wasn't able to find an actual solution for this problem. Note: I am running on a single machine in a pseudo-distributed environment.

hadoop pipes -D -D -input /input -output /output -program /bin/wordcount
DEPRECATED: Use of this script to execute mapred command is deprecated.
Instead use the mapred command for it.

15/02/18 01:09:02 INFO Configuration.deprecation: is deprecated. Instead, use dfs.metrics.session-id
15/02/18 01:09:02 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/02/18 01:09:02 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/02/18 01:09:03 WARN mapreduce.JobSubmitter: No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
15/02/18 01:09:04 INFO mapred.FileInputFormat: Total input paths to process : 1
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: number of splits:1
15/02/18 01:09:04 INFO Configuration.deprecation: is deprecated. Instead, use mapreduce.pipes.isjavarecordreader
15/02/18 01:09:04 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalDistributedCacheManager: Localized hdfs://localhost:9000/bin/wordcount as file:/tmp/hadoop-abdulrahman/mapred/local/1424214545411/wordcount
15/02/18 01:09:06 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter set in config null
15/02/18 01:09:06 INFO mapreduce.Job: Running job: job_local143452495_0001
15/02/18 01:09:06 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapred.FileOutputCommitter
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Waiting for map tasks
15/02/18 01:09:06 INFO mapred.LocalJobRunner: Starting task: attempt_local143452495_0001_m_000000_0
15/02/18 01:09:06 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
15/02/18 01:09:06 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/input/data.txt:0+68
15/02/18 01:09:07 INFO mapred.MapTask: numReduceTasks: 1
15/02/18 01:09:07 INFO mapreduce.Job: Job job_local143452495_0001 running in uber mode : false
15/02/18 01:09:07 INFO mapreduce.Job:  map 0% reduce 0%
15/02/18 01:09:07 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
15/02/18 01:09:07 INFO mapred.MapTask: 100
15/02/18 01:09:07 INFO mapred.MapTask: soft limit at 83886080
15/02/18 01:09:07 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
15/02/18 01:09:07 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
15/02/18 01:09:07 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
15/02/18 01:09:08 INFO mapred.LocalJobRunner: map task executor complete.
15/02/18 01:09:08 WARN mapred.LocalJobRunner: job_local143452495_0001
java.lang.Exception: java.lang.NullPointerException
    at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(
    at org.apache.hadoop.mapred.LocalJobRunner$
Caused by: java.lang.NullPointerException
    at org.apache.hadoop.mapred.pipes.Application.<init>(
    at org.apache.hadoop.mapred.MapTask.runOldMapper(
    at org.apache.hadoop.mapred.LocalJobRunner$Job$
    at java.util.concurrent.Executors$
    at java.util.concurrent.ThreadPoolExecutor.runWorker(
    at java.util.concurrent.ThreadPoolExecutor$
15/02/18 01:09:08 INFO mapreduce.Job: Job job_local143452495_0001 failed with state FAILED due to: NA
15/02/18 01:09:08 INFO mapreduce.Job: Counters: 0
Exception in thread "main" Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(
    at org.apache.hadoop.mapred.pipes.Submitter.runJob(
    at org.apache.hadoop.mapred.pipes.Submitter.main(

I downloaded the sourcecode of hadoop and tracked where the exception is happening, it seems that the exception occurs in the initialization stage, and thus the code inside the mapper/reducer isn't really the problem.

The function in Hadoop that produces the exception is this one:

/** Run a set of tasks and waits for them to complete. */
435     private void runTasks(List<RunnableWithThrowable> runnables,
436         ExecutorService service, String taskType) throws Exception {
437       // Start populating the executor with work units.
438       // They may begin running immediately (in other threads).
439       for (Runnable r : runnables) {
440         service.submit(r);
441       }
443       try {
444         service.shutdown(); // Instructs queue to drain.
446         // Wait for tasks to finish; do not use a time-based timeout.
447         // (See
448"Waiting for " + taskType + " tasks");
449         service.awaitTermination(Long.MAX_VALUE, TimeUnit.NANOSECONDS);
450       } catch (InterruptedException ie) {
451         // Cancel all threads.
452         service.shutdownNow();
453         throw ie;
454       }
456 + " task executor complete.");
458       // After waiting for the tasks to complete, if any of these
459       // have thrown an exception, rethrow it now in the main thread context.
460       for (RunnableWithThrowable r : runnables) {
461         if (r.storedException != null) {
462           throw new Exception(r.storedException);
463         }
464       }
465     }

The problem though is that it is storing the exception and then throwing it, which is preventing me from knowing the actual source of the exception.

How do I resolve this issue?


  • So after a lot of research, I found out that the problem was actually caused by this line in pipes/ (line 104):

    byte[] password= jobToken.getPassword();

    I changed the code and recompiled hadoop:

    byte[] password= "no password".getBytes();
    if (jobToken != null)
         password= jobToken.getPassword();

    I got this from here

    This solved the problem, and my program currently runs, but I am facing another problem where the program actually hangs at map 0% reduce 0% I will open another topic for that question.

    Thank you,