We have a linux hadoop cluster but for a variety of reasons have some windows clients connecting and pushing data to the linux cluster. In hadoop1 we had been able to run hadoop via cygwin However in hadoop2 as stated on the website cygwin is not required or not supported.
Questions
Apart from the property below for job submissions is there anything else that needs to considered for windows/client interacting with a linux cluster
conf.set("mapreduce.app-submission.cross-platform", "true");
Extracting the hadoop-2.6.0-cdh5.5.2 and running it from cygwin with the right configurations under $HADOOP_HOME/etc yields some classpath or classpath formation issues class not found issues ? For instance the following run
hdfs dfs -ls
Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Then looking at the classpath looks like they contain cygwin paths . attempt to convert them to windows paths so that the jar can be looked up
in $HADOOP_HOME/etc/hdfs.sh locate the dfs command and change to
elif [ "$COMMAND" = "dfs" ] ; then
if $cygwin; then
CLASSPATH=`cygpath -p -w "$CLASSPATH"`
fi
CLASS=org.apache.hadoop.fs.FsShell
This results in the following:
16/04/07 16:01:05 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:378)
at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:393)
at org.apache.hadoop.util.Shell.<clinit>(Shell.java:386)
at org.apache.hadoop.util.GenericOptionsParser.preProcessForWindows(GenericOptionsParser.java:438)
at org.apache.hadoop.util.GenericOptionsParser.parseGeneralOptions(GenericOptionsParser.java:484)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:170)
at org.apache.hadoop.util.GenericOptionsParser.<init>(GenericOptionsParser.java:153)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:64)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)
16/04/07 16:01:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Warning: fs.defaultFs is not set when running "ls" command.
Found 15 items
-ls: Fatal internal error
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1010)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:505)
at org.apache.hadoop.util.Shell.run(Shell.java:478)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:738)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:831)
at org.apache.hadoop.util.Shell.execCommand(Shell.java:814)
at org.apache.hadoop.fs.FileUtil.execCommand(FileUtil.java:1100)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:582)
at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getOwner(RawLocalFileSystem.java:565)
at org.apache.hadoop.fs.shell.Ls.adjustColumnWidths(Ls.java:139)
at org.apache.hadoop.fs.shell.Ls.processPaths(Ls.java:110)
at org.apache.hadoop.fs.shell.Command.recursePath(Command.java:373)
at org.apache.hadoop.fs.shell.Ls.processPathArgument(Ls.java:98)
at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:271)
at org.apache.hadoop.fs.shell.Command.processArguments(Command.java:255)
at org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:118)
at org.apache.hadoop.fs.shell.Command.run(Command.java:165)
at org.apache.hadoop.fs.FsShell.run(FsShell.java:305)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
at org.apache.hadoop.fs.FsShell.main(FsShell.java:362)
For the above my question should I be going further to try and fix this so that i can reuse my existing client .sh scripts or just convert them .bat ?
the problem is that cygwin needs to return windows paths rather than cygwin paths. Also winutils.exe needs to be installed in the path as described here
Simply fix the scripts to return the actual win paths and turn off a few commands which don't run under cygwin
#!/bin/bash
# fix $HADOOP_HOME/bin/hdfs
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hdfs
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/bin/hdfs
sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"\$CLASSPATH\"\`" $HADOOP_HOME/bin/hdfs
# fix $HADOOP_HOME/libexec/hdfs-config.sh
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/libexec/hdfs-config.sh
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/libexec/hdfs-config.sh
# fix $HADOOP_HOME/libexec/hadoop-config.sh
sed -i "/HADOOP_DEFAULT_PREFIX=/a HADOOP_PREFIX=" $HADOOP_HOME/libexec/hadoop-config.sh
sed -i "/export HADOOP_PREFIX/i HADOOP_PREFIX=\`cygpath -p -w \"\$HADOOP_PREFIX\"\`" $HADOOP_HOME/libexec/hadoop-config.sh
# fix $HADOOP_HOME/bin/hadoop
sed -i -e "s/bin=/#bin=/g" $HADOOP_HOME/bin/hadoop
sed -i -e "s#DEFAULT_LIBEXEC_DIR=\"\$bin\"/../libexec#DEFAULT_LIBEXEC_DIR=\"\$HADOOP_HOME\\\libexec\"#g" $HADOOP_HOME/bin/hadoop
sed -i "/export CLASSPATH=$CLASSPATH/i CLASSPATH=\`cygpath -p -w \"\$CLASSPATH\"\`" $HADOOP_HOME/bin/hadoop