I am looking for documentation or general guidelines on when more Cassandra servers should be added to a ring. Should this be based on disk usage or other monitoring factors?
Currently I have some concerns about CoordinatorReadLatency, ReadLatency, and DroppedMessages.REQUEST_RESPONSE, but again I cannot find a good guide on how to interpret various components that I am monitoring. I can find good guides on performance tuning, but limited information on devops.
I understand that this question may be more relevant to Server Fault, but they don't have tags for Datastax Enterprise.
Thanks in advance
Next steps based on @bcoverston 's response
Nodetool provides access to read and write latency metrics: nodetool cfhistrograms
See docs here: http://www.datastax.com/documentation/cassandra/2.0/cassandra/tools/toolsCFhisto.html?scroll=toolsCFhisto#
Since we want to tie this into pretty graphs the nodetool source code points us to the right jmx values
https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/tools/NodeTool.java#L82
Each cf has write and read latency metrics.
The question is a little open ended, and it depends on your use case. There are a lot of things to monitor, and it can be overwhelming to look at every possible setting and decide if you need to increase your cluster size.
The general advice here is that you should monitor your read and write latency, decide where your thresholds should be, and plan your capacity accordingly. Because there is no proscriptive hardware for running Cassandra, and your use case can be unique to whatever your doing there are only rules of thumb.
Sizing your cluster based on data/node can be helpful, but only if I know how big your working set is, and what your latency targets are. In addition the speed of your storage media also matters.
Sizing your cluster based on latency makes more sense. If you need to do N tx/second you can test your hardware based on your workload and see if it can meet your targets. Keep in mind that when you do this you'll want to do a long term test to see if those targets hold up in a sustained manner, and also how long it will take until performance under that load when and if it will degrade (a write heavy workload will degrade over time, and you'll want to add capacity before you start missing your targets).