Search code examples
apache-zookeeperhadoop-yarngiraph

zookeeper configs for Giraph 1.0 on Hadoop 2.2.0


New to stack exchange and Giraph so please overlook mistakes and ask any clarifying questions.

OS: ubuntu 13.10

Hadoop/Yarn: hadoop-2.2.0/ (2-node cluster)

Giraph: 1.0.0 (EDIT: trunk)

I'm getting a NullPointerException (NPE) when I attempt to run the following example:

$ hadoop jar $GIRAPH_HOME/giraph-examples/target/giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2.0-jar-with-dependencies.jar org.apache.giraph.GiraphRunner org.apache.giraph.examples.SimpleShortestPathsComputation -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/hduser/rrdata/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/hduser/rrdata/output/tiny_graph.out -w 1

Stack Trace:

Exception in thread "main" java.lang.NullPointerException at org.apache.giraph.yarn.GiraphYarnClient.checkJobLocalZooKeeperSupported(GiraphYarnClient.java:460) at org.apache.giraph.yarn.GiraphYarnClient.run(GiraphYarnClient.java:116) at org.apache.giraph.GiraphRunner.run(GiraphRunner.java:96) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.giraph.GiraphRunner.main(GiraphRunner.java:126) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212)

It seems zookeeper related. I installed zookeeper but not having used it before it seems like the configs are wrong. I've tried -Dgiraph.zkList=hostname:port and related options but get 'Unrecognized option' exception.

Looking for the correct zookeeper settings for this scenarios. I'll post a reply if I figure it out.


Solution

  • This is an example how you can specify -D flags:

    hadoop jar giraph-examples-1.1.0-SNAPSHOT-for-hadoop-2.2-jar-with-dependencies.jar    org.apache.giraph.GiraphRunner -D giraph.zkList="zkNode.net:2081"  org.apache.giraph.examples.SimpleShortestPathsComputation  -vif org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat -vip /user/rav/giraph/input/tiny_graph.txt -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /user/rav/giraph/output/shortestpaths -w 1
    

    Btw local zookeeper is not supported in Giraph yet (GiraphYarnClient):

      /**
      * Check if the job's configuration is for a local run. These can all be
      * removed as we expand the functionality of the "pure YARN" Giraph profile.
      */
      private void checkJobLocalZooKeeperSupported() {
         final boolean isZkExternal = giraphConf.isZookeeperExternal();
         final String checkZkList = giraphConf.getZookeeperList();
         if (!isZkExternal || checkZkList.isEmpty()) {
            throw new IllegalArgumentException("Giraph on YARN does not currently" +
                "support Giraph-managed ZK instances: use a standalone ZooKeeper.");
         }
      }
    

    Unfortunately checkZkList is NULL so you will never see this exception :)