Search code examples
elasticsearchmonitoringelastic-stackelastic-beats

Is Elastic/Metricbeats suitable for process monitoring and alerting?


Do you use Elastic and Metricbeats for process monitoring and alerting? How did you configure your data gathering and alerting?

I am currently trying to set this up, and running into some basic issues. These issues are making me question whether Elastic is a suitable tool for alerting. Here is my planned setup:

  • Use Metricbeats to gather process data
  • Create an Elastic dashboard/lens for certain processes
  • If the process.cpu.start_time from Metricbeats is very young (e.g. it has only been running for under 5 minutes), alert!

I have been working my way through this using the following approach:

  • From Metricbeats, the processes include process.cpu.start_time, as a text string in ISO date format. Elastic lens queries are very limited with dates.
  • Workaround: use Logstash to create a filter field process.cpu.start_epoch, which is an integer - the Unix epoch: "seconds since January 1, 1970".
  • Create a dashboard lens, querying only my process, and only the last metric. This works and gives me "the time that the process started, as a Unix epoch".
  • I next need to calculate the time difference between now and that integer. However I don't see anything in the lens documentation about doing date math. So I'm stuck.

The difficulties I am encountering are making me wonder if I am "doing it wrong"? Is Elastic/Metricbeats a suitable tool for what I am trying to achieve?


Solution

  • Answer: find the right hammer!

    What I needed is called "Elastic runtime fields". There's a step-by-step writeup here: https://elastic-content-share.eu/elastic-runtime-field-example-repository/

    Summary:

    • open index
    • click the "dots"
    • choose "add field to index pattern"
    • set output field name as desired
      • for me this is process.cpu.start.age
    • set output type
      • for me this is "long"
    • write your script in "painless"
      • for me this is emit(Date().getTime() - doc['process.cpu.start'].value.toEpochMilli());

    PS: I deleted my logstash filters, because they were superfluous.