Because hadoop 1.0.3 doesn't support bzip2 decompress, so I copied the same classes from hadoop 2.2 into my project, but my project (or we call it jar) is still running on hadoop 1.0.3 cluster. I found hadoop still execute the the classes from 1.0.3 i.e the new classes were not executed.
How can I configure to use the classes in myself's jar firstly.
I know we may use something like:
hadoop jar collect_log.jar com.TestCol -Dmapreduce.task.classpath.user.precedence=true
But right now I'm using EMR, so I don't know how to set the precedence in EMR.
Thanks a lot!
EMR referees its hadoop jars from location /home/hadoop/lib You can try using bootstrap scripts to copy your new jars to this location.
Other option is when you launch emr . Connect to master node using ssh and key file and see ps -ef | grep java.
it will show current hadoop process and its jar orders ( class path) Later you can make changes in Bootsraop script to change class paths a per your new order
edited to add sample bootstrap script mybootstrap.sh
#!/bin/bash
hadoop fs -copyToLocal s3n://bucket/bootstrap/abc.jar /home/hadoop/lib/
upload this script to s3 bucket and assign it to emr launcher code as
RunJobFlowRequest request = new RunJobFlowRequest(.....
ScriptBootstrapActionConfig bootstrapScriptConfig = newScriptBootstrapActionConfig();
bootstrapScriptConfig.setPath(CONFIG_HADOOP_BOOTSTRAP_ACTION);
BootstrapActionConfig bootstrapConfig = new BootstrapActionConfig();
bootstrapConfig.setName("copy jar file");
bootstrapConfig.setScriptBootstrapAction(bootstrapScriptConfig);
request.withBootstrapActions(bootstrapConfig);
Here CONFIG_HADOOP_BOOTSTRAP_ACTION will be path for your bootstrap file.