Search code examples
clienthadoop2

Hadoop2 client on Windows for a Linux Cluster


We have a linux hadoop cluster but for a variety of reasons have some windows clients connecting and pushing data to the linux cluster. In hadoop1 we had been able to run hadoop via cygwin However in hadoop2 as stated on the website cygwin is not required or not supported.

Questions

  1. what exactly has changed ? why would a client (only) not run under cygwin or it could ? Apart from paths what other considerations are at play ?
  2. Apart from the property below for job submissions is there anything else that needs to considered for windows/client interacting with a linux cluster

    conf.set("mapreduce.app-submission.cross-platform", "true");

  3. Extracting the hadoop-2.6.0-cdh5.5.2 and running it from cygwin with the right configurations under $HADOOP_HOME/etc yields some classpath or classpath formation issues class not found issues ? For instance the following run

    hdfs dfs -ls
    Error: Could not find or load main class org.apache.hadoop.fs.FsShell
    

Then looking at the classpath looks like they contain cygwin paths . attempt to convert them to windows paths so that the jar can be looked up

in $HADOOP_HOME/etc/hdfs.sh locate the dfs command and change to 
      elif [ "$COMMAND" = "dfs" ] ; then
      if $cygwin; then
         CLASSPATH=`cygpath -p -w "$CLASSPATH"`
      fi
      CLASS=org.apache.hadoop.fs.FsShell

This results in the following:

16/04/07 16:01:05 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
    java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
            at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
            at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
            at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)
            at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
            at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
            at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
            at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)
    16/04/07 16:01:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Warning: fs.defaultFs is not set when running "ls" command.
    Found 15 items
    -ls: Fatal internal error
    java.lang.NullPointerException
            at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
            at org.apache.hadoop.util.Shell.run(Shell.java:478)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
            at org.apache.hadoop.util.Shell.execCommand(Shell.java:831)
            at org.apache.hadoop.util.Shell.execCommand(Shell.java:814)
            at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1100)
            at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
            at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getOwner(RawLocalFileSystem.java:565)
            at org.apache.hadoop.fs.shell.Ls.adjustColumnWidths(Ls.java:139)
            at org.apache.hadoop.fs.shell.Ls.processPaths(Ls.java:110)
            at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
            at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:98)
            at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
            at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
            at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
            at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
            at org.apache.hadoop.fs.FsShell.run(FsShell.java:305)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
            at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
            at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)

For the above my question should I be going further to try and fix this so that i can reuse my existing client .sh scripts or just convert them .bat ?


Solution

  • the problem is that cygwin needs to return windows paths rather than cygwin paths. Also winutils.exe needs to be installed in the path as described here

    Simply fix the scripts to return the actual win paths and turn off a few commands which don't run under cygwin

    #!/bin/bash
    # fix $HADOOP_HOME/bin/hdfs
    sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hdfs
    sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/bin/hdfs
    sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"\$CLASSPATH\"\`" $HADOOP_HOME/bin/hdfs
    
    # fix $HADOOP_HOME/libexec/hdfs-config.sh
    sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/libexec/hdfs-config.sh
    sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/libexec/hdfs-config.sh
    
    # fix $HADOOP_HOME/libexec/hadoop-config.sh
    sed -i "/HADOOP_DEFAULT_PREFIX=/a HADOOP_PREFIX=" $HADOOP_HOME/libexec/hadoop-config.sh
    sed -i "/export HADOOP_PREFIX/i HADOOP_PREFIX=\`cygpath -p -w \"\$HADOOP_PREFIX\"\`" $HADOOP_HOME/libexec/hadoop-config.sh
    
    # fix $HADOOP_HOME/bin/hadoop 
    sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hadoop 
    sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/bin/hadoop 
    sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"\$CLASSPATH\"\`" $HADOOP_HOME/bin/hadoop