Search code examples
javadatabasecassandradatastaxgossip

Questions on Cassandra Bootstrap


Could someone please respond to my below questions,

1) I have the 4 nodes 172.30.56.60, 172.30.56.61, 172.30.56.62, 172.30.56.63 and also I have configured the seeds as '172.30.56.60, 172.30.56.61' in cassandra.yaml in all the four nodes. NOTE : I didn't give any information about the '172.30.56.62, 172.30.56.63' in the cassandra.yaml file. But when I start the Cassandra in all the four nodes, How does Cassandra has the ability to identify the 62 and 63 ?

2) How exactly gossip protocol work / How exactly Cassandra bootstrap works?

Thanks,
Harry


Solution

  • (Disclaimer: I'm a Scylla employee)

    When you start Cassandra / Scylla on your nodes, they contact the seed/s nodes (which you defined in the yaml file for all 4 nodes) to get the information about the ring, token ranges and the other members in the ring (other nodes).

    Bootstrap controls the ability for the data in cluster to be automatically redistributed when a new node is inserted. The new node joining the cluster is defined as an empty node without system tables or data.

    • Contact the seed nodes to learn about gossip state.
    • Transition to Up and Joining state (to indicate it is joining the cluster; represented by UJ in the nodetool status).
    • Contact the seed nodes to ensure schema agreement.
    • Calculate the tokens that it will become responsible for.
    • Stream replica data associated with the tokens it is responsible for from the former owners.
    • Transition to Up and Normal state once streaming is complete (to indicate it is now part of the cluster; represented by UN in the nodetool status).

    You can read more about bootstrapping here: http://thelastpickle.com/blog/2017/05/23/auto-bootstrapping-part1.html

    The gossip protocol makes sure every node in the system eventually knows important information about every other node's state, including those that are unreachable or not yet in the cluster when any given state change occurs. Gossip timer task runs every second. During each of these runs the node initiates gossip exchange according to following rules:

    1. Gossip to random live endpoint (if any)
    2. Gossip to random unreachable endpoint with certain probability depending on number of unreachable and live nodes
    3. If the node gossiped to at (1) was not seed, or the number of live nodes is less than number of seeds, gossip to random seed with certain probability depending on number of unreachable, seed and live nodes.

    These rules ensure that if the network is up, all nodes will eventually know about all other nodes. (Clearly, if each node only contacts one seed and then gossips only to random nodes it knows about, you can have partitions when there are multiple seeds -- each seed will only know about a subset of the nodes in the cluster. Step 3 avoids this and more subtle problems.)

    This way a node initiates gossip exchange with one to three nodes every round (or zero if it is alone in the cluster)

    You can read more about gossip high-level architecture here: https://wiki.apache.org/cassandra/ArchitectureGossip

    You can read more about Scylla gossip implementation here: https://github.com/scylladb/scylla/wiki/Gossip