Search code examples
javahadoopmapreducedistributed-cache

how to set Hadoop DistributedCache?


when I run the hadoop code to add the third jar,just like the following code:

public static void addTmpJar(String jarPath, JobConf conf) throws IOException {
    System.setProperty("path.separator", ":");
    FileSystem fs = FileSystem.getLocal(conf);
    String newJarPath = new Path(jarPath).makeQualified(fs).toString();
    String tmpjars = conf.get("tmpjars");
    if (tmpjars == null || tmpjars.length() == 0) {
        conf.set("tmpjars", newJarPath);
    } else {
        conf.set("tmpjars", tmpjars + "," + newJarPath);
    }
}

I get the following exception:

Error initializing attempt_201405281453_0053_m_000002_0:

org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find any valid local directory for taskTracker/hadoop/distcache/-7315515059647727905_-860888033_1107570546/nn.hadoop.dev/tmp/hadoop-hadoop/mapred/staging/hadoop/.staging/job_201405281453_0053/libjars/mahout-core-0.8-job.jar at org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(LocalDirAllocator.java:381) at org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(LocalDirAllocator.java:146) at org.apache.hadoop.filecache.TrackerDistributedCacheManager.getLocalCache(TrackerDistributedCacheManager.java:173) at org.apache.hadoop.filecache.TaskDistributedCacheManager.setupCache(TaskDistributedCacheManager.java:187) at org.apache.hadoop.mapred.TaskTracker$4.run(TaskTracker.java:1320) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.TaskTracker.initializeJob(TaskTracker.java:1311) at org.apache.hadoop.mapred.TaskTracker.localizeJob(TaskTracker.java:1226) at org.apache.hadoop.mapred.TaskTracker$5.run(TaskTracker.java:2603) at java.lang.Thread.run(Thread.java:744)

any one who can tell how to solve this problem,thanks!


Solution

  • From the commandline you can add a jar to the distributedcache using -libjars, the only prerequisite is that your MR program implements Tool which uses GenericOptionsParser, the latter takes care of adding the jar to the cache.

    This page explains the above in more detail