Search code examples
amazon-web-servicescassandradatastaxamazon-ec2

Cassandra only using AWS Spot Instances


Does anyone have any thoughts on putting together a fleet of Cassandra instances (including seeds) that rely solely on AWS Spot Instances and Elastic IP addresses. Keeping in mind this is a personal POC project, I'm trying to do it as cost effectively as possible.

If the cluster is 2 seed nodes and 4 non seed nodes could I create something that resembles the following for seed nodes:

  • Use 2 separate Auto Scaling Groups (ASG) with max, min 1 and auto assign a Elastic IP probably via userdata on startup.
  • Seed nodes with a higher spot price than the non seed nodes
  • Seed nodes always start off with a publicly assigned IP address so they can route to perform API calls and initiate a Associate EIP to itself.
  • A seed node is exactly like a non-seed node except that it has the EIP script associated with it

for non-seed nodes

  • Auto Scaling Groups with min count at 4 and desired level at 4
  • Set their seed IP addresses in the cassandra.yaml file to point at the elastic IP addresses.

and for starter seed nodes

  • The first couple of seed nodes to be created will probably be done outside of the ASG to kick off the process called the starter seed nodes.
  • Once those starter seed nodes are setup and talking, I plan to spawn the 2 seed nodes ASGs that will reassign the EIPs and take over the role as seed nodes.
  • Destroy the starter seed nodes once the ASG seed nodes have taken over.

I'm familiar with AWS and the scripting to make that happen, but I am very new to Cassandra so:

  1. Is my proposal possible?
  2. Am I missing some glaring technological limitations with Cassandra that will cause problems in the future?
  3. Will this works with DataStax OpsCenter?
  4. Will the cleanup of old nodes automatically happen when ASG's scale up (or down)?
  5. When a new seed node comes online in the future, will the reassigning of EIP's to itself mess with its ability to sync with the cluster?

Things I have considered

  • If the entire fleet fails I plan to run Netflix Priam to keep backups at 30 minutes.
  • It will be rolled out to multi AZ's and regions if it works in this poc.
  • In production I would keep the config identical but run them with ondemand allocation

Thanks for your help for any reference material to make this happen.

Cheers


Solution

  • You can use Septaz (https://www.septaz.com) to run Cassandra or any other distributed system on spot instances reliably.