I am doing this :
DistributedCache.createSymlink(job.getConfiguration());
DistributedCache.addCacheFile(new URI
("hdfs:/user/hadoop/harsh/libnative1.so"),job.getConfiguration());
and in the mapper :
System.loadLibrary("libnative1.so");
(i also tried System.loadLibrary("libnative1"); System.loadLibrary("native1");
But I am getting this error:
java.lang.UnsatisfiedLinkError: no libnative1.so in java.library.path
I am totally clueless what should I set java.library.path to .. I tried setting it to /home and copied every .so from distributed cache to /home/ but still it didn't work :(
Any suggestions / solutions please?
Use the Hadoop ToolRunner interface; this will give you the ability to add the shared libraries to the distributed cache via command line arguments and will set the java library path properly on the task nodes before the mappers start. This is how I setup a mapper to use shared libraries:
Have the job class (that contains the main() method) implement the org.apache.hadoop.util.Tool interface. Something like this:
public class Job extends Configured implements Tool {
@Override
public int run(String[] args) throws Exception {
/* create the Hadoop Job here */
}
public static void main(String[] args) {
int ret;
try {
ret = ToolRunner.run(new Job(), args);
} catch (Exception e) {
e.printStackTrace();
ret = -1;
}
System.exit(ret);
}
}
When running the hadoop job, give all of the shared libraries (local copies) as command line arguments. Make sure to list the actual files as well (if these are symlinks). Hadoop will copy all of the files given in the -files argument to the distributed cache before starting the job.
hadoop jar Job.jar -files libnative1.so,libnative1.so.0,libnative1.so.0.1
Mapper does not require any special calls to set the java.library.path; it is take care of by hadoop.