Search code examples
azuremonitoralerts

Azure Monitor Custom log search Query - understanding Period and Frequency


UPDATE:

the actual problem is different from what I've described. I'll provide and update/edit to this ticket once we'll resolve the issue. More details may be found at this thread - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/319315/highlight/false#M1224

Original question:

We use Azure Monitor to create alerts based on logs in Log Analytics. For this we choose our Log Analytics account as a "RESOURCE", then choose "Custom log search" signal name for "CONDITION". Alert logic - "Number of results greater than 0".

Sample query:

search *
| where ResourceProvider == "MICROSOFT.DATAFACTORY" and status_s == "Failed"

For Period and Frequency lets set 15 minutes. All looks simple, but...

The issue: described above setup does not work (it works sometimes), because alerts are fired only sometimes, a lot of them are missed which is completely unacceptable behavior.

If we set Period = Frequency = 5 minutes we basically miss almost every event. Period = Frequency = 15 minutes works better, but still a lot of events are missing. Period = Frequency = 30 works even better, but all this looks weird.

Important notice - logs are collected from Data Factory V2 into Log Analytics. I suspect that alert misses are due to the fact that logs are delivered to Log Analytics with some delay (up to several minutes). So when Azure Monitor evaluates alert query for the last 15 minutes (Period=15) it might be that most resent log entries are still not in Log Analytics. When next query evaluation is executed in 15 minutes it will miss the logs that were ingressed with a delay for prev 15 minutes interval. Is this assumption correct? If so, this is very weird - how then we supposed to configure Period and Frequency values? If I set Period > Frequency (e.g. Period = 30 and Frequency = 5, which means "evaluate expression every 5 minutes, take data for last 30 minutes from current time") then we get multiple duplicated alerts because Period is larger than Frequency so there is a big chance of log search query returning the same log entries every 5 minutes - this is highly undesirable behavior.


Solution

  • Issue happened to be with a buggy bahavior of ARM template creating alerts. Thanks to Stanislav Zhelyazkov it has been nailed down and resolved - I use alternative ARM API now and it seems to work fine. More details on the topic may be found here - https://techcommunity.microsoft.com/t5/Azure-Log-Analytics/Reliably-trigger-alerts-for-Log-Analytics-log-entries/m-p/309610.