I have upgraded my neo4j embedded DB from 2.3.9 to 3.2.3 in SINGLE mode, it has upgraded successfully. After upgrade, I have enabled "HA" mode. While running neo4j with 3 clusters, I am facing below issue.
Individually servers are running fine in HA mode. (i.e. ha.initial_hosts = "ip_address_1:5101"), but if I add three servers under initial_hosts (as shown in the config), all three servers are stopping immediately.
Am I missing any configuration? Please suggest.
Config:
neo4j {
# Enable these two options while upgrading neo4j database.
# dbms.allow_format_migration=true
# or weak or strong
cache_type = "weak"
# Reduce the default page cache memory allocation
dbms.memory.pagecache.size="6G"
# Port to listen to for incoming backup requests.
dbms.backup.address = ${local.private-ip}":6367"
# Unique server id for this Neo4j instance
# can not be negative id and must be unique
ha.server_id="1"
# List of other known instances in this cluster
ha.initial_hosts = "ip_1:5101,ip_2:5101,ip_3:5101"
# ha.initial_hosts = "ip_1:5101"
# ha.cluster_server = ${local.private-ip}":5101"
# IP and port for this instance to bind to for communicating cluster information
# with the other neo4j instances in the cluster.
ha.host.coordination = ${local.private-ip}":5101"
# IP and port for this instance to bind to for communicating data with the
# other neo4j instances in the cluster.
ha.host.data = ${local.private-ip}":6365"
# HA - High Availability
# SINGLE - Single mode, default.
dbms.mode="HA"
# HTTP Connector
dbms.connector.http.enabled="true"
dbms.connector.http.listen_address=":7474"
# Bolt connector
dbms.connector.bolt.enabled="true"
dbms.connector.bolt.tls_level="OPTIONAL"
dbms.connector.bolt.listen_address=":7689"
}
From the neo4j debug.log:
2017-10-09 12:35:47.153+0000 ERROR [o.n.k.h.c.m.HighAvailabilityModeSwitcher] Error while trying to switch to slave Cannot find the master among [] with master serverId=1 and uri=ha://ip_address_1:6365?serverId=1
java.lang.IllegalStateException: Cannot find the master among [] with master serverId=1 and uri=ha://ip_address_1:6365?serverId=1
at org.neo4j.kernel.ha.cluster.SwitchToSlave.checkMyStoreIdAndMastersStoreId(SwitchToSlave.java:263)
at org.neo4j.kernel.ha.cluster.SwitchToSlaveBranchThenCopy.checkDataConsistency(SwitchToSlaveBranchThenCopy.java:142)
at org.neo4j.kernel.ha.cluster.SwitchToSlave.executeConsistencyChecks(SwitchToSlave.java:478)
at org.neo4j.kernel.ha.cluster.SwitchToSlave.switchToSlave(SwitchToSlave.java:221)
at org.neo4j.kernel.ha.cluster.modeswitch.HighAvailabilityModeSwitcher$1.run(HighAvailabilityModeSwitcher.java:355)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
at org.neo4j.helpers.NamedThreadFactory$2.run(NamedThreadFactory.java:109)
2017-10-09 12:35:47.154+0000 INFO [o.n.k.h.c.m.HighAvailabilityModeSwitcher] Attempting to switch to slave in 300s
Default Value for join_timeout is 30 seconds.
ha.join_timeout=10m