Search code examples
cassandrabenchmarkingycsbnosql

Performance metrics in NoSQL databases


I am trying to benchmark NoSQL database (i.e. Cassandra) by using YCSB benchmarking tool. To do that, I obviously need to set performance metrics in which I will measure the performance. I am deciding to choose Read, write and update. Now I am pretty unsure that they are the right metrics to choose, or may be something like Scale-up, and/or elastic speedup will be worth choosing? Please give me some suggestion.


Solution

  • You mentioned operation latency (read, write, update). This is definitely a very important metric, so you should design tests that show how the latency changes in this scenarios:

    • Operation latency, for varying loads (operations per second).
    • Operation latency, for varying workloads (consider different mixes or percentages of operations in the workload).
    • (Less important) Operation latency for varying key popularity distributions.

    In addition, other things that you can test are:

    • Elastic speedup: Impact (on operation latency) of adding servers online.
    • Fault tolerance: Impact (on operation latency) of having random servers fail.
    • Load balance: How good is the DB in balancing the load across the servers, considering different key popularity distributions and different temporal locality in the workloads.
    • Scalability: How having more or less nodes affects operation latency. In this case, the servers are NOT added online (that would be the elastic speedup experiment).
    • If you are running your experiments on EC2, then how does the choice of EC2 instance type (medium, large, etc.) affect performance.

    In addition, consider using histograms or box plots to observe effect on latency, as plotting only averages does not let you measure the variability in the latency.

    Finally, take a look at this VLDB paper for more ideas.