I've just come across RRD lately by trying out ganglia monitoring system. Ganglia stores the monitoring data in RRD. I am just wondering that, from scalability perspective, how RRD works ? What if I have potentially huge amount of data to store. Like the ganglia case, if I want to store all the historical monitoring statistics instead of just storing the data recently with a specific TTL, will RRD good enough to cope with that?
Can someone who used RRD share some experience on how does RRD scale, and how does it compare to RDBMS or even big table?
RRD is designed to automatically blur (average out) your data over time, such that total size of database stays roughly the same, even as new data continuously arrives.
So, it is only good if you want some historical data and are willing to lose precision over time.
In other words, you cannot really compare RRD to standard SQL databases or to Bigtable, because standard SQL and NoSQL databases all store data precisely - you will read exactly what was written.
With RRDtool, however, there is no such guarantee. But its speed makes it attractive solution for all kinds of monitoring solutions, where only most recent data matters.