Search code examples
monitoringrrdtool

rrdtool outputs wrong numbers


Following script monitors the established TCP connections from ss -s

rrdtool=$(which rrdtool);
db=/opt/rrd/estabconns.rrd
img=/usr/share/nginx/html/awp

export IFS=" ," 
arr=($(ss -s | grep TCP:))
total=${arr[3]}

if [ ! -e $db ]
then
        $rrdtool create $db \
        -s 5 \
        DS:conns:GAUGE:600:0:50000000000 \
        RRA:AVERAGE:0.5:1:576 \
        RRA:AVERAGE:0.5:6:672 \
        RRA:AVERAGE:0.5:24:732 \
        RRA:AVERAGE:0.5:144:1460
fi

$rrdtool updatev $db -t conns N:$total

for period in hour day week month year
do
        $rrdtool graph $img/connections-$period.png -s -1$period \
        -t "ams1 connections - $period" -z \
        -c "BACK#FFFFFF" -c "SHADEA#FFFFFF" -c "SHADEB#FFFFFF" \
        -c "MGRID#AAAAAA" -c "GRID#CCCCCC" -c "ARROW#333333" \
        -c "FONT#333333" -c "AXIS#333333" -c "FRAME#333333" \
        -h 134 -w 543 -l 0 -a PNG -v "concurrent connections" \
        DEF:conns=$db:conns:AVERAGE \
        VDEF:min=conns,MINIMUM \
        VDEF:max=conns,MAXIMUM \
        VDEF:avg=conns,AVERAGE \
        VDEF:lst=conns,LAST \
        "COMMENT: \l" \
        "COMMENT:               " \
        "COMMENT:Minimum    " \
        "COMMENT:Maximum    " \
        "COMMENT:Average    " \
        "COMMENT:Current    \l" \
        "COMMENT:   " \
        "LINE1:conns#0AC43C:Conns  " \
        "GPRINT:min:%6.0lf      " \
        "GPRINT:max:%6.0lf      " \
        "GPRINT:avg:%6.0lf      " \
        "GPRINT:lst:%6.0lf      \l" > /dev/null
done

My cronjob:

*/5 * * * * /bin/bash /root/rrd.sh

The problem is that it shows wrong numbers. My TCP connections from ss -s are always 150-300 but on the output pictures it shows values between 0 and 3, sometimes higher but in general they are wrong.


Solution

  • First, note that you have defined the DS type to be GAUGE in the code -- but the behaviour you mention sounds a little like it is a COUNTER. Did you previously create it as a counter? Try using rrdtool info $db on the RRD file and verify it is, in fact, a GAUGE. You may find out it is not.

    Secondly, the RRD file has a Step of 5 seconds and the DS has a 10 minute heartbeat. You are updating every 5 minutes. Did you mean to have a 5 minute (300s) step? Otherwise you'll get some interesting effects as rows are filled... the step should usually match the expected update frequency.

    Thirdly, you may have problems with the update occurring with timestamp N, rather than precisely on the step boundary. This will cause data normalisation, but not enough to alter 300 into 3. Fix this using something like

    $time = `date +%s`
    $offset = `expr $time % $step`
    $t = `expr $time - $offset`
    $rrdtool update $db -t conns $t:$total
    

    Finally -- are you SURE you are using the correct values to update? Make your script output the value of $total to a log file each time it updates , so you can confirm that you are updating with the values you think you are.