Search code examples
javahadoopmapreducedistributed-cache

Distibuted Cache in Reduce Hadoop


I want to hold File A in the memory of reducer1 and File B in the memory of reducer2. Is this possible using Distributed Cache technology in hadoop? Or else, is there any other way to acheive this?

Thanks


Solution

  • Yes if the files are considerably small you can set these files in distributed cache. Follow this link http://developer.yahoo.com/hadoop/tutorial/module5.html#auxdata. It might be useful to u.

    And if you consider this portion of the code its up to u which file u want to work upon in which reducer.

    Path [] cacheFiles = DistributedCache.getLocalCacheFiles(conf);
          if (null != cacheFiles && cacheFiles.length > 0) {
            for (Path cachePath : cacheFiles) {
              if (cachePath.getName().equals(stopwordCacheName)) {
                loadStopWords(cachePath);
                break;
              }
            }
    

    See if it helps