Running with a slightly modified demo docker-compose taken from here, thanks GraphAware guys
I got a successful causal cluster running using docker-compose up
. I can't get the same thing up using docker swarm however.
The compose file is the same:
version: '3.3'
networks:
neonet:
driver: overlay
attachable: true
ipam:
config:
- subnet: 10.161.0.0/24
services:
neo-1:
image: neo4j:3.3.4-enterprise
networks:
- neonet
volumes:
- /srv/neo4j/neo4j-core1/data:/data
- /srv/neo4j/neo4j-core1/logs:/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_dbms_mode=CORE
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connector_http_listen__address=:7474
- NEO4J_dbms_connector_https_listen__address=:6477
- NEO4J_dbms_connector_bolt_listen__address=:7687
neo-2:
image: neo4j:3.3.4-enterprise
networks:
- neonet
volumes:
- /srv/neo4j/neo4j-core2/data:/data
- /srv/neo4j/neo4j-core2/logs:/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_dbms_mode=CORE
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connector_http_listen__address=:7474
- NEO4J_dbms_connector_https_listen__address=:6477
- NEO4J_dbms_connector_bolt_listen__address=:7687
neo-3:
image: neo4j:3.3.4-enterprise
networks:
- neonet
volumes:
- /srv/neo4j/neo4j-core3/data:/data
- /srv/neo4j/neo4j-core3/logs:/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_dbms_mode=CORE
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connector_http_listen__address=:7474
- NEO4J_dbms_connector_https_listen__address=:6477
- NEO4J_dbms_connector_bolt_listen__address=:7687
..except in the docker-compose up
i neither specify overlay network details, nor deploy specifics. Both clusters run on a single machine.
If i shell into the container for the standalone docker-compose, the ip address looks ok and port 5000 is 'curlable'; doing the same (curl ip:5000) for the swarm deployed container results in connection refused.
Running netstat -ntlp
gives:
/var/lib/neo4j # netstat -ntlp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 10.161.0.166:5000 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.11:44137 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:7000 0.0.0.0:* LISTEN -
gives port 5000 listening on an ip address that is not of any interface on this machine (ifconfig):
eth0 Link encap:Ethernet HWaddr 02:42:0A:A1:00:A7
inet addr:10.161.0.167 Bcast:10.161.0.255 Mask:255.255.255.0
UP BROADCAST RUNNING MULTICAST MTU:1450 Metric:1
RX packets:119 errors:0 dropped:0 overruns:0 frame:0
TX packets:119 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:7110 (6.9 KiB) TX bytes:7110 (6.9 KiB)
eth1 Link encap:Ethernet HWaddr 02:42:AC:12:00:06
inet addr:172.18.0.6 Bcast:172.18.255.255 Mask:255.255.0.0
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:8 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:648 (648.0 B) TX bytes:0 (0.0 B)
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:65536 Metric:1
RX packets:58 errors:0 dropped:0 overruns:0 frame:0
TX packets:58 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1
RX bytes:3604 (3.5 KiB) TX bytes:3604 (3.5 KiB)
..as you can see there are 2 interfaces, my neonet network, and (i assume) docker's ingress.
Furthermore, neo4j has instructed itself by config to listen for discovery on all interfaces:
causal_clustering.transaction_listen_address=0.0.0.0:6000
causal_clustering.transaction_advertised_address=2a9e1683a92e:6000
causal_clustering.raft_listen_address=0.0.0.0:7000
causal_clustering.raft_advertised_address=2a9e1683a92e:7000
causal_clustering.initial_discovery_members=neo1:5000,neo2:5000,neo3:5000
causal_clustering.expected_core_cluster_size=3
causal_clustering.discovery_listen_address=0.0.0.0:5000
causal_clustering.discovery_advertised_address=2a9e1683a92e:5000
EDITION=enterprise
ACCEPT.LICENSE.AGREEMENT=yes
...but is somehow making a decision to listen on a certain IP - which it does for 5000 but not for 7000 incidentally.
I'm no networking fundi, but it doesn't look right to listen on an IP that is bound to no interface on this machine.
How to instruct Neo4J to bind to all interfaces? or at least a valid one?
Turns out there were multiple fixes, the core being setting deploy.endpoint_node: dnsrr
to prevent the creation of a docker virtual IP. In the end my working swarm file looks like below.
Working = multiple node working neo4j causal cluster of cores (only); working 100% with Neo4J OGM v3 client connection url bolt+routing://neo-1:7687
. I wasn't brave enough yet to try fail over the initial connection; so SPF on neo-1 (initially).
version: '3.3'
services:
neo-1:
image: neo4j:3.3.4-enterprise
volumes:
- neo-data:/data
- neo-logs:/var/lib/neo4j/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_causalClustering_discoveryAdvertisedAddress=neo-1:5000
- NEO4J_causalClustering_transactionAdvertisedAddress=neo-1:6000
- NEO4J_causalClustering_raftAdvertisedAddress=neo-1:7000
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connectors_default__advertised__address=neo-1
- NEO4J_dbms_connector_bolt_advertised__address=:7687
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_dbms_mode=CORE
deploy:
mode: global
endpoint_mode: dnsrr
placement:
constraints:
- node.labels.neodb == 1
networks:
- neonet
neo-2:
image: neo4j:3.3.4-enterprise
volumes:
- neo-data:/data
- neo-logs:/var/lib/neo4j/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_causalClustering_discoveryAdvertisedAddress=neo-2:5000
- NEO4J_causalClustering_transactionAdvertisedAddress=neo-2:6000
- NEO4J_causalClustering_raftAdvertisedAddress=neo-2:7000
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connectors_default__advertised__address=neo-2
- NEO4J_dbms_connector_bolt_advertised__address=:7687
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_dbms_mode=CORE
deploy:
mode: global
endpoint_mode: dnsrr
placement:
constraints:
- node.labels.neodb == 2
networks:
- neonet
neo-3:
image: neo4j:3.3.4-enterprise
volumes:
- neo-data:/data
- neo-logs:/var/lib/neo4j/logs
environment:
- NEO4J_AUTH=neo4j/blah
- NEO4J_causalClustering_discoveryAdvertisedAddress=neo-3:5000
- NEO4J_causalClustering_transactionAdvertisedAddress=neo-3:6000
- NEO4J_causalClustering_raftAdvertisedAddress=neo-3:7000
- NEO4J_causalClustering_expectedCoreClusterSize=3
- NEO4J_causalClustering_initialDiscoveryMembers=neo-1:5000,neo-2:5000,neo-3:5000
- NEO4J_dbms_connectors_default__advertised__address=neo-3
- NEO4J_dbms_connector_bolt_advertised__address=:7687
- NEO4J_ACCEPT_LICENSE_AGREEMENT=yes
- NEO4J_dbms_mode=CORE
deploy:
mode: global
endpoint_mode: dnsrr
placement:
constraints:
- node.labels.neodb == 3
networks:
- neonet
networks:
neonet:
driver: overlay
volumes:
neo-data:
neo-logs:
I'm pretty sure this is too verbose; and by now there's probably a solution that allows only one service (with multiple replicas) to be declared.