I have a topology running on a Storm cluster with 3 supervisor nodes(32GRAM each node). In the first several days, the topology goes well, everything is ok. But the following error always occurred and the topology gone down after several days running:
java.lang.RuntimeException: java.lang.RuntimeException: java.lang.RuntimeException
org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /brokers/topics/TOPICNAME/partitions at storm.kafka.ZkCoordinator.refresh
The topology uses a spout to consume messages from a remote Kafka service which sits on an remote server and this server is also the zookeeper service on.
I guess the reason for this exception is that the zookeeper server is instability, OR the network connection is unstable.
I have no permission to do anything with the remote kafka/zookeeper server, So I need a solution by my side to keep the topology running stably. Is there anyway to let the topology runs stably OR anyway to skip the exception while it comes out? Or is there anyway to resubmit topology automatically?
The first thing you should have done is to google for what causes the connection loss error.
Then go to storm's log files and view which line of code is causing the error.
The right way to do things is to find out what is causing the error.
However, if you want the quicker temporary solution, then use Storm's REST API to kill the topology. Then you can use a normal Java program or a script in any language to re-launch the topology from the commandline.