Search code examples
hadoopmapreducehadoop2

Hadoop command line -D options not working


I am trying to pass a variable (not property) using -D command line option in hadoop like -Dmapred.mapper.mystring=somexyz. I am able to set a conf property in Driver program and read it back in mapper. So I can use this to pass my string as additional parameter and set it in Driver. But I want to see if -D option can be used to do the same

My command is:

$HADOOP_HOME/bin/hadoop jar  /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -Dmapred.mapper.mystring=somexyz

Driver program

String s_ptrn=conf.get("mapred.mapper.regex");

System.out.println("debug: in Tool Class mapred.mapper.regex "+s_ptrn + "\n"); Gives NULL

BUT this works

conf.set("DUMMYVAL","100000000000000000000000000000000000000"); in driver is read properly in mapper by get method. 

My question is if all of Internet is saying i can use -D option then why cant i? is it that this cannot be used for any argument and only for properties? whihc we can read by putitng in file that i should read in driver program then use it?

Something like

Configuration conf = new Configuration();
conf.addResource("~/conf.xml"); 

in driver program and this is the only way.


Solution

  • As Thomas wrote, you are missing the space. You are also passing variable mapred.mapper.mystring in your CLI, but in the code you are trying to get mapred.mapper.regex. If you want to use -D parameter, you should be using Tool interface. More about it is here - Hadoop: Implementing the Tool interface for MapReduce driver.

    Or you can parse your CLI arguments like this:

    @Override
    public int run(String[] args) throws Exception {
    Configuration conf = this.getConf();
    
    String[] otherArgs = new GenericOptionsParser(conf,args).getRemainingArgs();
    while (i<otherArgs.length) {
            if (otherArgs[i].equals("-x")) {
                //Save your CLI argument
                yourVariable = otherArgs[++i];
    }
    //then save yourVariable into conf for using in map phase
    

    Than your command can be like this:

    $HADOOP_HOME/bin/hadoop jar /home/hduser/Hadoop_learning_path/toolgrep.jar /home/hduser/hadoopData/inputdir/ /home/hduser/hadoopData/grepoutput -x yourVariable
    

    Hope it helps