I'm following this guide to configure messos 3 node master and 3 node slave cluster. However when I start master zookeepers I get following error log
2017-07-05 09:46:18,568 - INFO [main:FileSnap@83] - Reading snapshot /var/lib/zookeeper/version-2/snapshot.100000016
2017-07-05 09:46:18,606 - ERROR [main:FileTxnSnapLog@210] - Parent /mesos/log_replicas missing for /mesos/log_replicas/0000000002
2017-07-05 09:46:18,607 - ERROR [main:QuorumPeer@453] - Unable to load database on disk
java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /mesos/log_replicas
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /mesos/log_replicas
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
... 6 more
2017-07-05 09:46:18,610 - ERROR [main:QuorumPeerMain@89] - Unexpected exception, exiting abnormally
java.lang.RuntimeException: Unable to run quorum server
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:454)
at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:409)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:151)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:111)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:78)
Caused by: java.io.IOException: Failed to process transaction type: 1 error: KeeperErrorCode = NoNode for /mesos/log_replicas
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:153)
at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:223)
at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:417)
... 4 more
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /mesos/log_replicas
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.processTransaction(FileTxnSnapLog.java:211)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:151)
... 6 more
When slaves are started obviously it cannot discover the masters since it cannot connect to zookeeper. Slaves gives this error
I0705 09:33:43.593530 25710 provisioner.cpp:410] Provisioner recovery complete
I0705 09:33:43.593668 25710 slave.cpp:5970] Finished recovery
W0705 09:33:53.529522 25717 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
I0705 09:33:53.530243 25717 group.cpp:510] ZooKeeper session expired
W0705 09:34:03.532635 25710 group.cpp:494] Timed out waiting to connect to ZooKeeper. Forcing ZooKeeper session (sessionId=0) expiration
I0705 09:34:03.533331 25710 group.cpp:510] ZooKeeper session expired
Any ideas how to troubleshoot this.
Reinstalling master nodes solved the first problem. Still I had the 2nd problem, where slaves could not find zookeeper. Documentation seems to indicate slaves could discover the master nodes. Was not working for me. However when I pointed zookeeper nodes in slaves in (/etc/mesos/zk) it started working