Search code examples
google-cloud-platformmonitoringstackdrivergoogle-cloud-monitoring

gcp monitoring "Any time series violates" vs "All time series violate"


enter image description here

What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.

All time series? how long is its range? and why does it have a for option?


Solution

  • What's the difference between the two options "Any time series violates" and "All time series violate"? I can imagine what would the former one do easily, but I have no idea what would the latter one do.

    First, what is "time series violates" - its when CURRENT VALUE of metric is outside of expected range, e.g: above the threshold specified.

    Second, "any/all/percent/number" - let's say you have 5 time series, e.g.: cpu usage on 5 instances, then per dropdown options the whole alert condition will violate when:

    • "any time series": any 1 of the time series is in violation
    • "all time series": all 5 of the time series are in violation
    • "percent of time series" (40%): 2 out of 5 of the time series are in violation, and yes, selecting 39% or 41% on small numbers will give you different results, so
    • "number of time series" (3): 3 out of 5 of the time series are in violation

    Third, for aka Duration box, - it looks like "if my time series violates FOR 5 minutes, then violate the condition". And for some simpler alerts this can even work, but once you try to combine it with say, "metric is absent" or other complicated configuration, you will see that what actually happens is "wait for 5 minutes after the problem is there, and only then trigger the violation".

    In practice, the use of for field is discouraged and its better to keep it on default "Most recent value".

    If you do need the "cpu usage is above 90% for 5 minutes", then correct way of doing it is by denoizing/smoothing your data:

    • set alignment period to 5 minutes (or whatever is the sliding window that you want)
    • then choose reasonable aligner (like, mean which will average the values)
    • and then while the chart will have less datapoints, they would be less noisy and you can act upon the latest value.