I need to deploy Cassandra on AWS but am confused as to what type of AWS storage is most suitable for Cassandra.
The Datastax documentation here:
http://docs.datastax.com/en/cassandra/3.0/cassandra/planning/planPlanningEC2.html
says that EBS volumes are recommended. At the same time the Datastax AMI documentation:
http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installAMI.html
says that:
Uses RAID0 ephemeral disks for data storage and commit logs.
Launches EBS-backed instances for faster start-up, not database storage.
So which one is the recommended storage type for Cassandra? The EBS storage or the Instance storage?
Many of the new eC2 instances are EBS only (http://www.ec2instances.info/) I am not sure when the cassandra document was written but EBS disk have improved a lot recently and amazon launches new type frequently, so you will be able to find what you're looking for with one of the type
You can check https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/EBSVolumeTypes.html?icmpid=docs_ec2_console and its recommended Provisioned IOPS SSD (io1)
To add a reason why AWS is moving to EBS and why it would be good for cassandra data is because of ephemeral type of data, you might not want your data to disappear if your instance is terminated (because of a crash or a stop you made) at least when your instance is gone, you still have access to your data and can attach the EBS volume to a new instance (really useful also when up/down-grading instances)