My map function has to read a file for every input. That file doesn't change at all, it is only for reading. Distributed cache might help me a lot i think, but i cant find a way to use it. The public void configure(JobConf conf) function that i need to override, i think is deprecated. Well JobConf is deprecated for sure. All the DistributedCache tutorials use the deprecated way to. What can i do? Is there another configure function that i can override??
These are the very first lines of my map function:
Configuration conf = new Configuration(); //load the MFile
FileSystem fs = FileSystem.get(conf);
Path inFile = new Path("planet/MFile");
FSDataInputStream in = fs.open(inFile);
DecisionTree dtree=new DecisionTree().loadTree(in);
I want to cache that MFile so that my map function doesn't need to look it over and over again
Jobconf
was deprecated in 0.20.
x but in 1.0.0
it is not! :-) (as of writing this)
To your question, there are two ways to run map reduce jobs in java, one is by using (extending
) classes in org.apache.hadoop.mapreduce
package and other is by implementing
classes in org.apache.hadoop.mapred
package (or the other way round ).
Not sure which one you are using, if you don't have a configure
method to override, you will get a setup
method to override.
@Override
protected void setup(Context context) throws IOException, InterruptedException
This is similar to configure and should help you.
You get a setup
method to override
when you extend Mapper class
in org.apache.hadoop.mapreduce
package.