I'm updating a rrdtool
round-robin database with 1 minute intervals. I want to store the average value of five updates as one RRA entry in rrdtool
RRD. One way to do this is like this:
$ rrdtool create foo.rrd --start 1000000500 --step 60 \
> DS:ping:GAUGE:120:0:1000 RRA:AVERAGE:0.5:5:12; \
> rrdtool update foo.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
It accumulates five readings and stores the average of those as an entry in RRA. However, one could achieve the same with this:
$ rrdtool create bar.rrd --start 1000000500 --step 300 \
> DS:ping:GAUGE:600:0:1000 RRA:AVERAGE:0.5:1:12; \
> rrdtool update bar.rrd 1000000560:10 1000000620:20 \
> 1000000680:30 1000000740:40 1000000800:50
As seen above, step
is 300 seconds, but as RRD PDP accepts values between the intervals and calculates the average, then both examples store 30((10+20+30+40+50)/5
) in RRA. One difference, which I can tell, is that first example requires at least three updates to store an entry to RRA while in case of the second example, a single update within 300 second step is enough. Are there any other differences?
These two examples are not really the same thing under the covers, though they can appear the same in some circumstances.
In the first, you have a 60s step, and your RRA stores the average of 5 PDPs in each CDP.
In the second, you have a 300s step, and your RRA stores each PDP as a CDP.
Here are some differences:
While the RRA outcome is pretty much the same in both, which you choose would depend on the nature of your incoming data (how often you get samples, how irregular these are), and your storage and display requirements (if you need higher granularity stored or displayed). The first option is more accurate if you have frequent samples; the second is less storage and less work but may sacrifice some data if you have updates more frequent than the step.
Note that, if you have other RRA types than just AVG, having a smaller step will make calculations more accurate.
In general, I would recommend that you set the step to be close to the expected average sample frequency,with a latency setting according to how regular the data are. Set your RRA consolodation depending on how you need to view and display the data, and how long you are required to hold history for. Create RRAs corresponding to normal display rollups, in order to minimise the amount of on-the-fly calculations.