Search code examples
javaspark-streaminghadoop-yarnhadoop2distributed-cache

Yarn Distributed cache, no mapper/reducer


I am unable to access files in distributed cache in Hadoop 2.6. Below is a code snippet. I am attempting to place a file pattern.properties, which is in args[0] in the distributed cache of Yarn

Configuration conf1 = new Configuration();
Job job = Job.getInstance(conf1);
DistributedCache.addCacheFile(new URI(args[0]), conf1);

Also, I am trying to access the file in cache using the below:

Context context =null;
URI[] cacheFiles = context.getCacheFiles();  //Error at this line
System.out.println(cacheFiles);

But I am getting the below error at the line mentioned above:

java.lang.NullPointerException

I am not using Mapper class. It's just a spark stream code to access a file in cluster. I want the file to be distributed in the cluster. But I can't take it from HDFS.


Solution

  • I don't know whether I understood your question correctly.

    We had some local files which we need to access in Spark streaming jobs.

    We used this option:-

    time spark-submit --files /user/dirLoc/log4j.properties#log4j.properties 'rest other options'

    Another way we tried was :- SparkContext.addFile()