Search code examples
javacassandracluster-computingpelops

Two Cluster each with 12 nodes- Cassandra database


I have started working on a project in which I am using Cassandra database.

Our production DBA's have setup two cluster and each cluster will have 12 nodes.

I will be using Pelops client to read the data from Cassandra database. Now I am thinking what's the best way to create Cluster class using Pelops client like how many nodes I should add to Pelops while creating cluster?

My understanding was to create the cluster using pelops client with all the 24 nodes as I will be having two cluster each with 12 nodes? This is the right approach?

If not, then how we decide what nodes (from each cluster) I should add while creating the cluster using Pelops client?

String[] nodes = what nodes I should use from two clusters? And how many nodes I should add?; 

int port = cfg.getInt("cassandra.port"); 

boolean dynamicND = true; // dynamic node discovery 

Config casconf = new Config(port, true, 0); 

Cluster cluster = new Cluster(nodes, casconf, dynamicND); 

Pelops.addPool(Const.CASSANDRA_POOL, cluster, Const.CASSANDRA_KS);

Can anyone help me out with this?

Any help will be appreciated.


Solution

  • I'll try to better explain the comment in the other post. My tip is to give pelops just the seeds of each cluster. Assuming you have 2 seeds for each cluster, I'd use these 4 nodes to create my pool.

    You are using Pelops -- from Dominic Williams (Pelops creator) documentation:

    To create a pool, you need to specify a name, a list of known contact nodes (the library can automatically detect further nodes in the cluster, but see notes at the end), the network port that the nodes are listening on, and a policy which controls things like the number of connections in your pool. -- so it's not necessary passing the whole nodes list.

    From Cassandra documentation:

    Cassandra nodes exchange information about one another using a mechanism called Gossip, but to get the ball rolling a newly started node needs to know of at least one other, this is called a Seed. It's customary to pick a small number of relatively stable nodes to serve as your seeds, but there is no hard-and-fast rule here

    I'm in production with Pelops from 3 years (started with Cassandra 0.6 now 1.0.6) and the approach of using just seeds as list of nodes ... works fine!

    couple of tips:

    1 : if you are worried about connections and ring health you can write a class that make "random queries" using EACH_QUORUM CL to check connections -- and a class that uses the Nodetool java classes to check the health of the ring

    2 : if you perform deletes in cassandra remember the importance of nodetool repair (http://wiki.apache.org/cassandra/Operations#Repairing_missing_or_inconsistent_data)

    Regards, Carlo