I have following events that are logged periodically (every minute):
14:58 index=prod_service service.error error.count="3"
14:59 index=prod_service service.error error.count="4"
15:00 index=prod_service service.error error.count="0"
15:01 index=prod_service service.error error.count="10"
I've set up an alert to alert me when we have 10 Events in an hour that have more than "0" error.counts, however I would like to change it to alerting me when the count over all events is greater than 10 in an hour. So how can I sum the error.count over all events (which would be 17)
My current query only counts the number of events that have more than 0 errors...:
index=prod-service service.count | where sum('error.count') > 0
This is what worked for me:
index=prod-service "service.error" | timechart sum(error.count) AS "Count" | stats sum("Count") as "Total"
And then in the alert settings I had to choose custom as trigger condition instead of Number of Results and enter:
search Total > 10