How to let EMR execute customer jar first

Because hadoop 1.0.3 doesn't support bzip2 decompress, so I copied the same classes from hadoop 2.2 into my project, but my project (or we call it jar) is still running on hadoop 1.0.3 cluster. I found hadoop still execute the the classes from 1.0.3 i.e the new classes were not executed. How can I configure to use the classes in myself's jar firstly. I know we may use something like: hadoop jar collect_log.jar com.TestCol -Dmapreduce.task.classpath.user.precedence=true
But right now I'm using EMR, so I don't know how to set the precedence in EMR. Thanks a lot!

Solution

EMR referees its hadoop jars from location /home/hadoop/lib You can try using bootstrap scripts to copy your new jars to this location.

Other option is when you launch emr . Connect to master node using ssh and key file and see ps -ef | grep java.

it will show current hadoop process and its jar orders ( class path) Later you can make changes in Bootsraop script to change class paths a per your new order

edited to add sample bootstrap script mybootstrap.sh

#!/bin/bash
hadoop fs -copyToLocal s3n://bucket/bootstrap/abc.jar /home/hadoop/lib/

upload this script to s3 bucket and assign it to emr launcher code as

        RunJobFlowRequest request = new RunJobFlowRequest(.....
        ScriptBootstrapActionConfig bootstrapScriptConfig = newScriptBootstrapActionConfig();
        bootstrapScriptConfig.setPath(CONFIG_HADOOP_BOOTSTRAP_ACTION);

        BootstrapActionConfig bootstrapConfig = new BootstrapActionConfig();
        bootstrapConfig.setName("copy jar file");
        bootstrapConfig.setScriptBootstrapAction(bootstrapScriptConfig);
        request.withBootstrapActions(bootstrapConfig);

Here CONFIG_HADOOP_BOOTSTRAP_ACTION will be path for your bootstrap file.