Search code examples
twitterapache-zookeeperheronapache-aurora

Heron Failed to set packing plan for topology 'WordCountTopology'


When I submitted WordCountTopology to the Heron Cluster that deployed with Aurora scheduler and Zookeeper, this error happened as follows:

yitian@heron01:~$ heron submit aurora/yitian/devel --config-path ~/.heron/conf ~/.heron/examples/heron-api-examples.jar com.twitter.heron.examples.api.WordCountTopology WordCountTopology --deploy-deactivated
[2018-06-04 00:55:54 +0000] [INFO]: Using cluster definition in /home/yitian/.heron/conf/aurora
[2018-06-04 00:55:54 +0000] [INFO]: Launching topology: 'WordCountTopology'
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.JDK14LoggerFactory]
[2018-06-04 00:55:55 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Starting Curator client connecting to: heron01:2181  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.curator.framework.imps.CuratorFrameworkImpl: Starting  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:host.name=heron01  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.version=1.8.0_151  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.home=/usr/java/jdk1.8.0_151/jre  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.class.path=:/home/yitian/.heron/lib/scheduler/heron-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-binpacking-packing.jar:/home/yitian/.heron/lib/scheduler/heron-yarn-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-slurm-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-mesos-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-marathon-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-kubernetes-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-roundrobin-packing.jar:/home/yitian/.heron/lib/scheduler/heron-local-scheduler.jar:/home/yitian/.heron/lib/scheduler/heron-aurora-scheduler.jar:/home/yitian/.heron/lib/uploader/heron-localfs-uploader.jar:/home/yitian/.heron/lib/uploader/heron-hdfs-uploader.jar:/home/yitian/.heron/lib/uploader/heron-null-uploader.jar:/home/yitian/.heron/lib/uploader/heron-scp-uploader.jar:/home/yitian/.heron/lib/uploader/heron-s3-uploader.jar:/home/yitian/.heron/lib/uploader/heron-gcs-uploader.jar:/home/yitian/.heron/lib/uploader/heron-dlog-uploader.jar:/home/yitian/.heron/lib/statemgr/heron-localfs-statemgr.jar:/home/yitian/.heron/lib/statemgr/heron-zookeeper-statemgr.jar:/home/yitian/.heron/lib/packing/heron-binpacking-packing.jar:/home/yitian/.heron/lib/packing/heron-roundrobin-packing.jar  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.io.tmpdir=/tmp  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:java.compiler=<NA>  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:os.name=Linux  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:os.arch=amd64  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:os.version=4.10.0-28-generic  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:user.name=yitian  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:user.home=/home/yitian  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Client environment:user.dir=/home/yitian  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Initiating client connection, connectString=heron01:2181 sessionTimeout=30000 watcher=org.apache.curator.ConnectionState@2a17b7b6  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ClientCnxn: Opening socket connection to server heron01/218.195.241.174:2181. Will not attempt to authenticate using SASL (unknown error)  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ClientCnxn: Socket connection established to heron01/218.195.241.174:2181, initiating session  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.zookeeper.ClientCnxn: Session establishment complete on server heron01/218.195.241.174:2181, sessionid = 0x163c9cb80580003, negotiated timeout = 30000  
[2018-06-04 00:55:55 -0700] [INFO] org.apache.curator.framework.state.ConnectionStateManager: State change: CONNECTED  
[2018-06-04 00:55:55 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Directory tree initialized.  
[2018-06-04 00:55:55 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Checking existence of path: /heron/topologies/WordCountTopology  
[2018-06-04 00:55:58 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Target topology file already exists at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0--1244338252177277003.tar.gz'. Overwriting it now  
[2018-06-04 00:55:58 -0700] [INFO] com.twitter.heron.uploader.hdfs.HdfsUploader: Uploading topology package at '/tmp/tmpXlYukj/topology.tar.gz' to target HDFS at '/heron/topologies/aurora/WordCountTopology-yitian-tag-0--1244338252177277003.tar.gz'  
[2018-06-04 00:56:02 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/WordCountTopology  
[2018-06-04 00:56:02 -0700] [WARNING] com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception processing future: java.lang.RuntimeException: Could not createNode:  
[2018-06-04 00:56:02 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Deleted node for path: /heron/topologies/WordCountTopology  
18/06/04 00:56:05 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
Deleted /heron/topologies/aurora/WordCountTopology-yitian-tag-0--1244338252177277003.tar.gz
[2018-06-04 00:56:05 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the CuratorClient to: heron01:2181  
[2018-06-04 00:56:05 -0700] [INFO] org.apache.zookeeper.ZooKeeper: Session: 0x163c9cb80580003 closed  
[2018-06-04 00:56:05 -0700] [INFO] org.apache.zookeeper.ClientCnxn: EventThread shut down  
[2018-06-04 00:56:05 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Closing the tunnel processes  
[2018-06-04 00:56:05 +0000] [ERROR]: Failed to set packing plan for topology 'WordCountTopology'
[2018-06-04 00:56:05 +0000] [ERROR]: Failed to launch topology 'WordCountTopology' 

What's wrong with it? Thanks for your help.


Solution

  • From the logs:

    [2018-06-04 00:56:02 -0700] [INFO] com.twitter.heron.statemgr.zookeeper.curator.CuratorStateManager: Created node for path: /heron/topologies/WordCountTopology
    [2018-06-04 00:56:02 -0700] [WARNING] com.twitter.heron.spi.statemgr.SchedulerStateManagerAdaptor: Exception processing future: java.lang.RuntimeException: Could not createNode:

    Heron is able to connect to ZK and check node successfully, so ZK cluster seems to be running and readable. However it seems that for some reason curator failed to create the node in ZK: /heron/topologies/WordCountTopology

    Heron topology keeps its running data in ZK so it cant start if it failed to create the key nodes. You need to find out the cause for the ZK failure (permission?) and solve it.