Search code examples
snmpmibsanopennms

OpenNMS - Storage (SNMP MIB-2 Host Resources) giving incorrect values


I am using OpenNMS Horizon to monitor several nodes. For a given node it is monitoring "Storage (SNMP MIB-2 Host Resources) " which tells about the Disk space (% of usage). While for local Disks of the node I am getting correct values. For the SAN File system disks wrong values (also negative values) are coming. However for few SAN volumes it is giving correct values. What are the possible reasons for this error?


Solution

  • The data comes from the default MIB-II data collection configuration defined in ${OPENNMS_HOME}/etc/datacollection/mib2.xml.

    <resourceType name="hrStorageIndex" label="Storage (SNMP MIB-2 Host Resources)" resourceLabel="${hrStorageDescr}">
      <persistenceSelectorStrategy class="org.opennms.netmgt.collection.support.PersistAllSelectorStrategy"/>
      <storageStrategy class="org.opennms.netmgt.dao.support.SiblingColumnStorageStrategy">
        <parameter key="sibling-column-name" value="hrStorageDescr"/>
        <parameter key="replace-first" value="s/^-$/_root_fs/"/>
        <parameter key="replace-all" value="s/^-//"/>
        <parameter key="replace-all" value="s/\s//"/>
        <parameter key="replace-all" value="s/:\\.*//"/>
      </storageStrategy>
    </resourceType>
    

    The resource type definition tells the SNMP collector how to deal with multiple instances of the disks.

    The following part tells the SNMP collector which OIDs where queried and persisted in the system for each selected instance of the disk:

    <group name="mib2-host-resources-storage" ifType="all">
      <mibObj oid=".1.3.6.1.2.1.25.2.3.1.2" instance="hrStorageIndex" alias="hrStorageType" type="string"/>
      <mibObj oid=".1.3.6.1.2.1.25.2.3.1.3" instance="hrStorageIndex" alias="hrStorageDescr" type="string"/>
      <mibObj oid=".1.3.6.1.2.1.25.2.3.1.4" instance="hrStorageIndex" alias="hrStorageAllocUnits" type="gauge"/>
      <mibObj oid=".1.3.6.1.2.1.25.2.3.1.5" instance="hrStorageIndex" alias="hrStorageSize" type="gauge"/>
      <mibObj oid=".1.3.6.1.2.1.25.2.3.1.6" instance="hrStorageIndex" alias="hrStorageUsed" type="gauge"/>
    </group>
    

    The first thing I would investigate is what are the values you receive from the SNMP agent of the device using the snmpwalk command line tool on the OIDs above.

    Received values are persisted by default in RRDTool and the calculation to get percentage is done in the RRD graph template which you can find in ${OPENNMS_HOME}/etc/snmp-graph.properties.d/mib2-graph.properties.

    The complete RRD template definition looks like this:

    report.mib2.storage.usage.name=Storage Utilization (MIB-2 Host Resources)
    report.mib2.storage.usage.columns=hrStorageSize, hrStorageUsed, hrStorageAllocUnits
    report.mib2.storage.usage.propertiesValues=hrStorageDescr
    report.mib2.storage.usage.type=hrStorageIndex
    report.mib2.storage.usage.command=--title="Storage Utilization on {hrStorageDescr}" \
     --vertical-label="Percentage (%)" \
     --base=1024 \
     --lower-limit 0 \
     --upper-limit 105 \
     DEF:total={rrd1}:hrStorageSize:AVERAGE \
     DEF:used={rrd2}:hrStorageUsed:AVERAGE \
     DEF:units={rrd3}:hrStorageAllocUnits:AVERAGE \
     CDEF:totalBytes=total,units,* \
     CDEF:usedBytes=used,units,* \
     CDEF:usedPart=usedBytes,totalBytes,/ \
     CDEF:dpercent=usedPart,100,* \
     CDEF:dpercent10=0,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent20=10,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent30=20,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent40=30,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent50=40,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent60=50,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent70=60,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent80=70,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent90=80,dpercent,GT,0,dpercent,IF \
     CDEF:dpercent100=90,dpercent,GT,0,dpercent,IF \
     COMMENT:"Storage used in (%)\\n" \
     AREA:dpercent10#5ca53f:"0-10% " \
     AREA:dpercent20#75b731:"11-20%" \
     AREA:dpercent30#90c22f:"21-30%" \
     AREA:dpercent40#b8d029:"31-40%" \
     AREA:dpercent50#e4e11e:"41-50%" \
     COMMENT:"\\n" \
     AREA:dpercent60#fee610:"51-60%" \
     AREA:dpercent70#f4bd1b:"61-70%" \
     AREA:dpercent80#eaa322:"71-80%" \
     AREA:dpercent90#de6822:"81-90%" \
     AREA:dpercent100#d94c20:"91-100%\\n" \
     COMMENT:"\\n" \
     HRULE:100#d94c20 \
     COMMENT:"\\n" \
     LINE1:dpercent#46683b:"Storage used in (%)" \
     GPRINT:dpercent:AVERAGE:"Avg\\: %7.2lf%s" \
     GPRINT:dpercent:MIN:"Min\\: %7.2lf%s" \
     GPRINT:dpercent:MAX:"Max\\: %7.2lf%s\\n" \
     COMMENT:"\\n" \
     COMMENT:"Used Bytes\\: \\n" \
     GPRINT:usedBytes:AVERAGE:"Avg\\: %7.2lf%s" \
     GPRINT:usedBytes:MIN:"Min\\: %7.2lf%s" \
     GPRINT:usedBytes:MAX:"Max\\: %7.2lf%s\\n" \
     COMMENT:"\\n" \
     GPRINT:totalBytes:AVERAGE:"Total Bytes\\: %7.2lf%s"
    

    Most important part to get the percentage is the calculation done here:

    DEF:total={rrd1}:hrStorageSize:AVERAGE \
    DEF:used={rrd2}:hrStorageUsed:AVERAGE \
    DEF:units={rrd3}:hrStorageAllocUnits:AVERAGE \
    CDEF:totalBytes=total,units,* \
    CDEF:usedBytes=used,units,* \
    CDEF:usedPart=usedBytes,totalBytes,/ \
    CDEF:dpercent=usedPart,100,* \
    

    It uses the RRDTool specific reverse polish notation to calculate the utilization in percent from the "used bytes" and "total bytes". As you can see the used and total bytes need to be calculated from the SNMP agents unit size in bytes and how many units are there in total and used on the disk.

    You can verify if the SNMP agent from the device gives reasonable values and you are able to recalculate the values by yourself.

    I hope this helps to debug your issue.