I am trying to set up an Apache CassandraDB cluster via docker.
I've managed to create up to and including 3 nodes using docker run --network cassandra -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=node0 --name "node0" -d cassandra
.
Whenever I add a 4th node using the same command (just changing the container name), another random node in the cluster crashes and the node that was created also exits shortly after being on the UJ
and DJ
stages.
Here's what I've tried.
docker network create cassandra
docker run --network cassandra -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=node0 --name "node0" -d cassandra
I then waited until the node was up and nodetool was showing this:
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.19.0.2 88.49 KiB 16 100.0% 15573541-fc19-4569-9a43-cb04e49e134f rack1
After that, I added two additional nodes to the cluster.
docker run --network cassandra -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=node0 --name "node1" -d cassandra
docker run --network cassandra -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=node0 --name "node2" -d cassandra
I waited until those two joined the cluster and nodetool showed this:
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.19.0.2 74.11 KiB 16 64.7% 15573541-fc19-4569-9a43-cb04e49e134f rack1
UN 172.19.0.4 98.4 KiB 16 76.0% 30afdc85-e863-452c-9031-59803e4b1f11 rack1
UN 172.19.0.3 74.04 KiB 16 59.3% 6d92cf62-65b4-4365-ab28-2d53872605e3 rack1
That seems good! After that, I wanted to add another node to test whether my replication factor was working properly. So, I added another node to the cluster using the same command:
docker run --network cassandra -e CASSANDRA_ENDPOINT_SNITCH=GossipingPropertyFileSnitch -e CASSANDRA_SEEDS=node0 --name "node3" -d cassandra
When I added this node, node1
crashed immediately. node3
(thats's the new one) was briefly on the UJ
(UP-Joining) stage and then switched to DJ
(DOWN-Joining) and was then removed from the nodelist.
Here are the results from nodetool status
, in order:
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.19.0.2 74.11 KiB 16 64.7% 15573541-fc19-4569-9a43-cb04e49e134f rack1
UN 172.19.0.4 74.03 KiB 16 76.0% 30afdc85-e863-452c-9031-59803e4b1f11 rack1
DN 172.19.0.3 74.04 KiB 16 59.3% 6d92cf62-65b4-4365-ab28-2d53872605e3 rack1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UJ 172.19.0.5 20.75 KiB 16 ? 2e4a25e4-3c81-4383-9c9f-6326e4043910 rack1
UN 172.19.0.2 74.11 KiB 16 64.7% 15573541-fc19-4569-9a43-cb04e49e134f rack1
UN 172.19.0.4 74.03 KiB 16 76.0% 30afdc85-e863-452c-9031-59803e4b1f11 rack1
DN 172.19.0.3 74.04 KiB 16 59.3% 6d92cf62-65b4-4365-ab28-2d53872605e3 rack1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
DJ 172.19.0.5 20.75 KiB 16 ? 2e4a25e4-3c81-4383-9c9f-6326e4043910 rack1
UN 172.19.0.2 74.11 KiB 16 64.7% 15573541-fc19-4569-9a43-cb04e49e134f rack1
UN 172.19.0.4 74.03 KiB 16 76.0% 30afdc85-e863-452c-9031-59803e4b1f11 rack1
DN 172.19.0.3 74.04 KiB 16 59.3% 6d92cf62-65b4-4365-ab28-2d53872605e3 rack1
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.19.0.2 74.11 KiB 16 64.7% 15573541-fc19-4569-9a43-cb04e49e134f rack1
UN 172.19.0.4 74.03 KiB 16 76.0% 30afdc85-e863-452c-9031-59803e4b1f11 rack1
DN 172.19.0.3 74.04 KiB 16 59.3% 6d92cf62-65b4-4365-ab28-2d53872605e3 rack1
Here are the logs for node1
:
As you can see, the first item in the log was the confirmation that node2
had connected to the cluster.
https://gist.github.com/janic0/7e464e5c819c37e6ed38819fb3c19eff
Here are the logs for node3
(again, that's the new node)
https://gist.github.com/janic0/0968b7136c3beb3ef76a2379f3cd9be5
I've investigated it and found that Docker kills these containers with exit code 137 - or "out of memory".
Yep, I thought something like that was happening.
Each of the nodes used up about 4GB of RAM and the forth node was just enough to force Docker to kill some of the containers. If you do want to host that many nodes on one machine for some reason, you can increase the memory limit:
So I've done something like this before. If you're just going to be doing some local testing and you want a multi-node cluster, I've used Minikube for that before. In fact, I put together a repo which has some resources for doing that: https://github.com/aploetz/cassandra_minikube
But another approach which might be a "quick fix" for you, would be to explicitly adjust the Java heap sizing to something much smaller for each of your nodes. In my Minikube example above, I'd set:
-Xms512M
-Xmx512M
-Xmn256M
This should create a 1/2 GB heap, which is plenty for local dev or some simple testing. You can set these values in your cassandra-env.sh
or jvm-server.options
file (depending on your Cassandra version).