Search code examples
data-miningrrd

RRD Time series data


I work for a company which receives data from smart meters. This data can be as much as 2 days old for a live stream and may get post populated in the case errors are made (gaps etc.). Currently we store this typically for 5 years. The data is then pulled into an SSAS Cube and aggregated into 1 minute, 5m, 30m, 1h, 1 day, 1 week, 1month aggregations. For each of these aggregations the Min, Max, Avg is also stored. Building this cube is slow and is not currently scalable since it mines its data from a singular source.

I think that an RRD style database per data point would be a better fit driven by the data push. However I have several questions about RRD (examples would be most welcome)

  1. Can RRD retain data granularity whilst also performing roll up over time?
  2. Can data be fed into RRD to correct gaps?

Examples would be welcome.


Solution

    1. Yes - you need to configure your RRAs appropriately.

    An RRA is a round-robin-archive and defines numbers of data points and resolution. So you can - assuming a 5 minute sample rate:

    RRA:AVERAGE:0.5:1:2000
    RRA:AVERAGE:0.5:12:2400
    

    Will hold about a week of 5m resolution, and 100d of 1hr resolution. But you could quite easily extend your 5m resolution RRA - although it will make your RRD bigger. The question is - do you actually need to? The whole point of RRDs is the auto archiving vs. graphing resolution - looking at a year's worth of stats and you can't render 5m resolution anyway. With 5m samples, a 1600px wide graph is only about 6 days anyway.

    1. Yes, but because of the way RRD works, it's somewhat annoying. Effectively you have to extract and replay the data to backfill the gaps. This doesn't necessarily work too well if you're 'replaying' things where you've lost resolution, because you won't have enough samples. You can rrdtool dump to extract the content of the RRD in XML form, which you can also directly modify and then rrdtool restore it. If you need to do this with any real frequency, I'd suggest using something other than rrdtool.