Search code examples
distributed-systemraftaeron

Is it possible to add members to Aeron Cluster w/o reconfiguring existing ones?


I'm looking for a way to add new members to existing Aeron cluster without reconfiguring existing ones.

It seems cluster members are defined statically during startup as described in the Cluster Tutorial:

final ConsensusModule.Context consensusModuleContext = new ConsensusModule.Context()
    .errorHandler(errorHandler("Consensus Module"))
    .clusterMemberId(nodeId)                                                                    
    .clusterMembers(clusterMembers(Arrays.asList(hostnames))) // <------ HERE                   
    .clusterDir(new File(baseDir, "consensus-module"))                                          
    .ingressChannel("aeron:udp?term-length=64k")                                                
    .logChannel(logControlChannel(nodeId, hostname, LOG_CONTROL_PORT_OFFSET))                    
    .replicationChannel(logReplicationChannel(hostname))                                         
    .archiveContext(aeronArchiveContext.clone());

If I understand this correctly, if I want to add more nodes, I need to reconfigure each existing node to include the new member.

Moreover, I found this in Aeron Cookbook (emphasis mine)

Key aspects of Raft:

  • there is a Strong Leader, which means that all log entries flow from the leader to followers
  • Raft makes use of randomized timers to elect leaders. This adds a few milliseconds to failover, but reduces the time to agree an elected leader (in Aeron Cluster, this is a maximum of the election timeout * 2).
  • the Raft protocol allows runtime configuration changes (i.e. adding new or removing nodes at runtime). At the time of writing, this feature is still pending in Aeron Cluster.

However, I do see classes like io.aeron.cluster.DynamicJoin and its usage in io.aeron.cluster.ConsensusModuleAgent which makes me think that adding nodes dynamically is possible and perhaps the cookbook is outdated.

Do you know a way to join more nodes without touching existing ones?


Solution

  • Yes, it is possible! The context should be built like this:

    ConsensusModule.Context()
        .errorHandler(errorHandler("Consensus Module"))
        .clusterMemberId(Aeron.NULL_VALUE) // <1>
        .clusterMembers("") // <2>
        .memberEndpoints(memberEndpoints(hostnames[nodeId], nodeId)) // <3>
        .clusterConsensusEndpoints(consensusEndpoints(hostnames)) // <4>
        .clusterDir(File(baseDir, "consensus-module"))
        .ingressChannel("aeron:udp?term-length=64k")
        .logChannel("aeron:udp?term-length=64k")
        .replicationChannel(logReplicationChannel(hostname))
        .archiveContext(aeronArchiveContext.clone())
    
    1. clusterMemberId must be set to Aeron.NULL_VALUE. The member ID will be generated automatically
    2. clusterMembers should be empty. Static members are not required for a dynamic node
    3. memberEndpoints is the channel configuration of this node. The format is ingress:port,consensus:port,log:port,catchup:port,archive:port. Very similar to static clusterMembers configuration for a single node but without member ID infront.
    4. clusterConsensusEndpoints is the comma-separated list consensus:port channels of known cluster members. I think of it similar to "bootstrap" list of hosts to join.