I have chained 2 mappers followed by 1 reducer. Is it possible to write the intermediate outputs (o/p of each mapper in the chain) to HDFS? I tried setting the OutputPath for each, but it doesnt seem to work. Now, am not sure if it can be done at all. Any suggestions?
The result is always written to HDFS as a SequenceFile. But if you are using a reducer, these guys are just temp-files and they get deleted after job completion. If you need the map output, you have to chain two jobs. One job with no reducer, and a job with a reducer. Or if you have a bit skill in writing hdfs files out of a map task, this is also possible.
The first approach is non-coded, but the second is. It's up to you!