Distributed Cache Files Hadoop

I want to attach different files to different reducers. Is it possible using distributed cache technology in hadoop?

I able to attach the same file(files) to all the reducers. But due to memory constraints, I want to know if I can attach different files to different reducers.

Forgive me if its an ignorant question.

Pls help!

Thanks in advance!

Solution

It is a strange desire since any reducer is not bound to a particular node and during the execution a reducer can be run on any node or even nodes (if there is a failure or speculative execution). Therefore all reducers should be homogeneous, the only thing that differs them is data they process.

So I suppose when you say that you want to put different files on different reducers you actually want to put different files on reducer and those files should correspond to the data (keys) those reducers will be processing.

The only one way I know to do it is put your data on HDFS and read it from reducer when it start processing data.