I have a doubt on the cassandra seed_provider assignment. In my environment, there are 3 cassandra nodes required to setup as cluster. How should I define it in the cassandra.yaml? I'm confused since most of the tutorials gave different answers.
Example: Host A - 192.168.1.1 Host B - 192.168.1.2 Host C - 192.168.1.3
The following is my current setup for Host A, is it correct?
What about the configuration for Host B & Host C?
# any class that implements the SeedProvider interface and has a
# constructor that takes a Map<String, String> of parameters will do.
seed_provider:
# Addresses of hosts that are deemed contact points.
# Cassandra nodes use this list of hosts to find each other and learn
# the topology of the ring. You must change this if you are running
# multiple nodes!
- class_name: org.apache.cassandra.locator.SimpleSeedProvider
parameters:
# seeds is actually a comma-delimited list of addresses.
# Ex: "<ip1>,<ip2>,<ip3>"
- seeds: "192.168.1.1,192.168.1.2,192.168.1.3"
For starters, you should not need to change the class_name
of the seed_provider
. AFAIK, there is only one that ships with Cassandra. It was defined to be "pluggable," to allow for custom seed providers to be written.
For seeds
, I don't recommend designating every node in the seed list. If there are only 3 nodes, then just provide 1 or 2. Seed nodes do not bootstrap data, and require a repair
to get consistent upon replacement. This can make node adds difficult.
But as far as I see, your current config will work. I would just build the seed list with a max of 2 nodes.
Just remember, that there are two main requirements for the seed_list
:
seed_list
.Do you mind further explain on what's the impact if I proceed to add all 3 nodes in the seed list? What are the reasons that you will only choose to add 1 or 2 nodes in seed list?
Sure, it all goes back to this:
Seed nodes do not bootstrap data
Therefore, designating all 3 nodes in the seed_list
on all 3 nodes allows for the following problems:
In these cases, a nodetool repair
will need to be run to get the initial data on to the newly added node.