Search code examples
elasticsearchelasticsearch-aggregationkibana-6elasticsearch-watchersiren

How do you get a single value and run conditional check in elasticsearch query syntax for percentages using sentinl plugin


I'm using Elasticsearch 6.4 and Kibana 6. Also I am using the Sentinl plugin.

https://github.com/sirensolutions/sentinl

This plugin is a free alternative to xpact watchers and monitoring. I'm having some difficulty writing the watcher queries correctly, however. I want to just pull the latest value of percent and alert when set percent is above 90%.

My Query:

    {
  "actions": {
    "email_html_alarm_4b1479be-5e70-492e-9e02-fb08412510ee": {
      "name": "Check CPU Usage Usage for ip-172-0-0-0",
      "throttle_period": "1m",
      "email_html": {
        "stateless": false,
        "to": "[email protected]",
        "from": "[email protected]",
        "subject": "Critical CPU Usage Percent over 90% : {{ payload.aggregations.cpu_used.value  }}",
        "priority": "high",
        "html": "<p>Your elasticsearch is using more than 90% of its CPU: {{ payload.aggregations.cpu_used.value }}. Please scale the cluster. found by the watcher <i>{{watcher.title}}</i>.</p>"
      }
    }
  },
  "input": {
    "search": {
      "request": {
        "index": [
          "metricbeat-*"
        ],
        "body": {
          "from": 0,
          "size": 1,
          "query": {
            "bool": {
              "must": [
                {
                  "exists": {
                    "field": "system.cpu.total.pct"
                  }
                }
              ],
              "filter": [
                {
                  "range": {
                    "@timestamp": {
                      "gte": "now-20s"
                    }
                  }
                },
                {
                  "query_string": {
                    "query": "beat.name:ip-172-0-0-0"
                  }
                }
              ]
            }
          },
          "aggs": {
            "cpu_used": {
              "terms": {
                "field": "system.cpu.total.pct",
                "size": 1
              }
            }
          }
        }
      }
    }
  },
  "condition": {
    "array_compare": {
      "payload.aggregations.cpu_used.buckets": {
        "path": "key",
        "gt": {
          "value": 0.9
        }
      }
    }
  },
  "trigger": {
    "schedule": {
      "later": "every 2 minutes"
    }
  },
  "disable": true,
  "report": true,
  "title": "CPU Check for ip-172-0-0-0",
  "wizard": {},
  "save_payload": false,
  "spy": false,
  "impersonate": false
}

The issue with this watcher is it will trigger when the value reads 0.1445555 and it's not greater than or equal to 0.9000 that is what the gte stands for.

When I run my query in the developer tools in kibana it returns the following:

{
  "took": 86,
  "timed_out": false,
  "_shards": {
    "total": 354,
    "successful": 354,
    "skipped": 331,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "metricbeat-production-6.4.2-2018.10.25",
        "_type": "doc",
        "_id": "T4KBrGYBIGbF3wm-Wgpd",
        "_score": 1,
        "_source": {
          "@timestamp": "2018-10-25T18:34:09.872Z",
          "host": {
            "name": "ip-172-0-0-0"
          },
          "metricset": {
            "name": "cpu",
            "module": "system",
            "rtt": 153
          },
          "system": {
            "cpu": {
              "user": {
                "pct": 0.1235
              },
              "idle": {
                "pct": 1.4769
              },
              "nice": {
                "pct": 0
              },
              "irq": {
                "pct": 0
              },
              "steal": {
                "pct": 0.002
              },
              "total": {
                "pct": 0.1406
              },
              "cores": 2,
              "softirq": {
                "pct": 0.001
              },
              "system": {
                "pct": 0.0141
              },
              "iowait": {
                "pct": 0.3825
              }
            }
          },
          "beat": {
            "hostname": "ip-172-0-0-0",
            "version": "6.4.2",
            "name": "ip-172-0-0-0"
          }
        }
      }
    ]
  },
  "aggregations": {
    "cpu_used": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 1,
      "buckets": [
        {
          "key": 0.14100000000000001,
          "doc_count": 1
        }
      ]
    }
  }
}

Given the array compare logic this shouldn't trip the alarm correct because the key value is only .14 and not gte .90 so this leads me to believe that I'm not getting the right value.

Sentinl describes array compare as:

Array compare condition Use array_compare to compare an array of values. For example, the following array_compare condition returns true if there is at least one bucket in the aggregation that has a doc_count greater than or equal to 25:

"condition": {
  "array_compare": {
    "payload.aggregations.top_amounts.buckets" : { 
      "path": "doc_count" ,
      "gte": { 
        "value": 25, 
      }
    }
  }
}

Options

Name Description array.path The path to the array in the execution context, specified in dot notation array.path.path The path to the field in each array element that you want to evaluate array.path.operator.quantifier How many matches are required for the comparison to evaluate to true: some or all. Defaults to some, there must be at least one match. If the array is empty, the comparison evaluates to false array.path.operator.value The value to compare against

Can someone help me with what I'm doing wrong with my watcher and or query.. I can't seem to get it to get the percent and check on the percent value.


Solution

  • These are the queries that ended up working with metrics beat.

    {
      "actions": {
        "email_html_alarm_4b1479be": {
          "name": "Check Disk Usage for ip-0-0-0-0",
          "throttle_period": "1m",
          "email_html": {
            "stateless": false,
            "to": "[email protected]",
            "from": "[email protected]",
            "subject": "Critical /dev/xvda1 Available Bytes: {{ payload.aggregations.disk_used.value  }}",
            "priority": "high",
            "html": "<p>Your elasticsearch node only has: {{ payload.aggregations.disk_used.value }} bytes available. Please snapshot and clean old indexes. found by the watcher <i>{{watcher.title}}</i>.</p>"
          }
        }
      },
      "input": {
        "search": {
          "request": {
            "index": [
              "metricbeat-*"
            ],
            "body": {
              "from": 0,
              "size": 1,
              "query": {
                "bool": {
                  "must": [
                    {
                      "exists": {
                        "field": "system.filesystem.used.pct"
                      }
                    },
                    {
                      "match": {
                        "system.filesystem.device_name": "/dev/xvda1"
                      }
                    }
                  ],
                  "filter": [
                    {
                      "range": {
                        "@timestamp": {
                          "gte": "now-1m"
                        }
                      }
                    },
                    {
                      "query_string": {
                        "query": "beat.name:ip-0-0-0-0"
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "disk_used": {
                  "avg": {
                    "field": "system.filesystem.available"
                  }
                }
              }
            }
          }
        }
      },
      "condition": {
        "script": {
          "script": "payload.aggregations.disk_used.value < 490497080832"
        }
      },
      "trigger": {
        "schedule": {
          "later": "every 2 minutes"
        }
      },
      "disable": false,
      "report": true,
      "title": "ESDisk",
      "wizard": {},
      "save_payload": false,
      "spy": false,
      "impersonate": false
    }
    
    
    
    ######CPUCHECK
    

    Second Query

    {
      "actions": {
        "email_html_alarm_4b1479be": {
          "name": "Check CPU Usage Usage for ip-0-0-0-0",
          "throttle_period": "1m",
          "email_html": {
            "stateless": false,
            "to": "[email protected]",
            "from": "[email protected]",
            "subject": "Critical CPU Usage Percent over 90% : {{ payload.aggregations.cpu_used.value  }}",
            "priority": "high",
            "html": "<p>Your elasticsearch is using more than 90% of its CPU: {{ payload.aggregations.cpu_used.value }}. Please scale the cluster. found by the watcher <i>{{watcher.title}}</i>.</p>"
          }
        }
      },
      "input": {
        "search": {
          "request": {
            "index": [
              "metricbeat-*"
            ],
            "body": {
              "from": 0,
              "size": 1,
              "query": {
                "bool": {
                  "must": [
                    {
                      "exists": {
                        "field": "system.cpu.total.pct"
                      }
                    }
                  ],
                  "filter": [
                    {
                      "range": {
                        "@timestamp": {
                          "gte": "now-20s"
                        }
                      }
                    },
                    {
                      "query_string": {
                        "query": "beat.name:ip-0-0-0-0"
                      }
                    }
                  ]
                }
              },
              "aggs": {
                "cpu_used": {
                  "terms": {
                    "field": "system.cpu.total.pct",
                    "size": 1
                  }
                }
              }
            }
          }
        }
      },
      "condition": {
        "array_compare": {
          "payload.aggregations.cpu_used.buckets": {
            "path": "key",
            "gte": {
              "value": 0.90
            }
          }
        }
      },
      "trigger": {
        "schedule": {
          "later": "every 2 minutes"
        }
      },
      "disable": true,
      "report": true,
      "title": "CPU Check for ip-0-0-0-0",
      "wizard": {},
      "save_payload": false,
      "spy": false,
      "impersonate": false
    }