Search code examples
giraph

How to run Giraph on YARN (Hadoop 2.6) ('Worker failed during input split')


I'm trying to set up a pseudo-distributed Hadoop 2.6 cluster for running Giraph jobs. As I couldn't find a comprehensive guide for that, I've been relying on Giraph QuickStart (http://giraph.apache.org/quick_start.html), which is unfortunately for Hadoop 0.20.203.0, and some pieces of Hadoop 2.6/YARN tutorials. In order to do the Right Thing, I came up with a bash script that should install Hadoop and Giraph. Unfortunately Giraph jobs fail repeatedly with 'Worker failed during input split' exception. I would really appreciate if anyone could point an error in my deployment precedure or provide another working way.

EDIT: My primary objective is to be able to develop Giraph 1.1 jobs. I don't need to run any heavy computation on my own (in the end, the jobs will be run on an external cluster), so if there is any easier way to have Giraph development environment, it will do.

The installation script is as follows:

#! /bin/bash
set -exu

echo "Starting hadoop + giraph installation; JAVA HOME is $JAVA_HOME"

INSTALL_DIR=~/apache_hadoop_giraph


mkdir -p $INSTALL_DIR/downloads

############# PHASE 1: YARN ##############

#### 1: Get and unpack Hadoop:

if [ ! -f $INSTALL_DIR/downloads/hadoop-2.6.0.tar.gz ]; then
  wget -P $INSTALL_DIR/downloads ftp://ftp.task.gda.pl/pub/www/apache/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
fi
tar -xf $INSTALL_DIR/downloads/hadoop-2.6.0.tar.gz -C $INSTALL_DIR

export HADOOP_PREFIX=$INSTALL_DIR/hadoop-2.6.0
export HADOOP_HOME=$HADOOP_PREFIX
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop


#### 2: Configure Hadoop and YARN

sed -i -e "s|^export JAVA_HOME=\${JAVA_HOME}|export JAVA_HOME=$JAVA_HOME|g" ${HADOOP_PREFIX}/etc/hadoop/hadoop-env.sh

cat <<EOF > ${HADOOP_PREFIX}/etc/hadoop/core-site.xml
<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>
EOF

cat <<EOF > ${HADOOP_PREFIX}/etc/hadoop/hdfs-site.xml
<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
EOF

cat <<EOF > ${HADOOP_PREFIX}/etc/hadoop/mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
EOF

cat <<EOF > ${HADOOP_PREFIX}/etc/hadoop/yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
EOF

#### 3: Prepare HDFS:

cd $HADOOP_PREFIX
export HDFS=$HADOOP_PREFIX/bin/hdfs

sbin/stop-all.sh # Just to be sure we have no running demons

# The following line is commented out in case some of SO readers have something important in /tmp:
# rm -rf /tmp/* || echo "removal of some parts of tmp failed"

$HDFS namenode -format
sbin/start-dfs.sh


#### 4: Create HDFS directories:
$HDFS dfs -mkdir -p /user
$HDFS dfs -mkdir -p /user/`whoami`



#### 5 (optional): Run a test job

sbin/start-yarn.sh
$HDFS dfs -put etc/hadoop input
bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+'
$HDFS dfs -cat output/*   # Prints some stuff grep'd out of input file 
sbin/stop-yarn.sh

#### 6: Stop HDFS for now
sbin/stop-dfs.sh





############# PHASE 2: Giraph ##############

#### 1: Get Giraph 1.1

export GIRAPH_HOME=$INSTALL_DIR/giraph
cd $INSTALL_DIR
git clone http://git-wip-us.apache.org/repos/asf/giraph.git giraph
cd $GIRAPH_HOME
git checkout release-1.1

#### 2: Build 

mvn -Phadoop_2 -Dhadoop.version=2.6.0 -DskipTests package 


#### 3: Run a test job:

# Remove leftovers if any:
$HADOOP_HOME/sbin/start-dfs.sh
$HDFS dfs -rm -r -f /user/`whoami`/output
$HDFS dfs -rm -r -f /user/`whoami`/input/tiny_graph.txt
$HDFS dfs -mkdir -p /user/`whoami`/input

# Place input:
$HDFS dfs -put tiny_graph.txt input/tiny_graph.txt

# Start YARN
$HADOOP_HOME/sbin/start-yarn.sh

# Run the job (this fails with 'Worker failed during input split'):
JAR=$GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.6.0-jar-with-dependencies.jar
CORE=$GIRAPH_HOME/giraph-core/target/giraph-1.1.0-for-hadoop-2.6.0-jar-with-dependencies.jar
$HADOOP_HOME/bin/hadoop jar $JAR \
         org.apache.giraph.GiraphRunner \
         org.apache.giraph.examples.SimpleShortestPathsComputation \
         -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat \
         -vip /user/ptaku/input/tiny_graph.txt \
         -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat \
         -op /user/ptaku/output/shortestpaths \
         -yj $JAR,$CORE \
         -w 1 \
         -ca giraph.SplitMasterWorker=false

The script runs smoothly up to the last command, which hangs for a long time with map 100% reduce 0% status; investigation of logfiles for the YARN containers reveals the mysterious java.lang.IllegalStateException: coordinateVertexInputSplits: Worker failed during input split (currently not supported). Full container logs are available at pastebin:

Container 1 (master): http://pastebin.com/6nYvtNxJ

Container 2 (worker): http://pastebin.com/3a6CQamQ

I have also tried building Giraph with hadoop_yarn profile (after removing STATIC_SASL_SYMBOL from pom.xml), but it doesn't change anything.

I'm running Ubuntu 14.10 64bit with 4GB RAM and 16GB of swap. Extra system info:

>> uname -a
Linux Graffi 3.13.0-35-generic #62-Ubuntu SMP Fri Aug 15 01:58:42 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>> which java
/usr/bin/java
>> java -version
java version "1.7.0_75"
OpenJDK Runtime Environment (IcedTea 2.5.4) (7u75-2.5.4-1~trusty1)
OpenJDK 64-Bit Server VM (build 24.75-b04, mixed mode)
>> echo $JAVA_HOME
/usr/lib/jvm/java-7-openjdk-amd64/jre
>> which mvn
/usr/bin/mvn
>> mvn --version
Apache Maven 3.0.5
Maven home: /usr/share/maven
Java version: 1.7.0_75, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.13.0-35-generic", arch: "amd64", family: "unix"

I would be really grateful for any help on how to get Giraph 1.1 running on Hadoop 2.6.


Solution

  • I had a similar problem a while ago. The problem was that my computer had uppercase letters in the hostname which is a known bug (https://issues.apache.org/jira/browse/GIRAPH-904). Changing the hostname to only lowercase letters fixed it for me.