So my map reduce functions run fine on my local VM, but on Azure it gives me Input path not found
error. I have two sets of mapper & reducer functions, the output from the first reducer goes into a temp folder that is the input to the second mapper.
FileInputFormat.addInputPath(job, new Path(args[0]));
FileSystem.get(conf).delete(new Path("file:///tmp/inter/"),true);
FileOutputFormat.setOutputPath(job, new Path("file:///tmp/inter/"));
boolean complete = job.waitForCompletion(true);
Job job2 = Job.getInstance(conf, "Q4a");
job2.setJarByClass(Q4a.class);
job2.setMapperClass(TokenizerMapper2.class);
job2.setCombinerClass(CountReducer2.class);
job2.setReducerClass(CountReducer2.class);
job2.setOutputKeyClass(Text.class);
job2.setOutputValueClass(Text.class);
// FileInputFormat.addInputPath(job2, new Path("file:///tmp/inter/part*"));
FileInputFormat.addInputPath(job2, new Path("file:///tmp/inter/"));
FileOutputFormat.setOutputPath(job2, new Path(args[1]));
System.exit(job2.waitForCompletion(true) ? 0 : 1);
The first mapper & reducer execute fully, and then the error is thrown. Line 58 is the last line in the pasted code, but I believe the error is from the temp input path? Do I need to refer to temp files in a different manner in Azure? Any help is much appreciated, thankyou.
Okay I just solved this. Looks like HDInsight does not support the folder structure I specified. I changed the path to simply "out" instead of "file:///tmp/inter" and it worked.