Search code examples
grafanagrafana-alerts

Is it possible to accelerate time in grafana?


Actually what I want to do,

I created dashboards to monitor the alert status in grafana. I created fake data in my system to simulate my alert situations on these boards. The time of this data covers the range now - now + 12h. In fact, it takes a long time to analyze the alert status in real data. For this reason, I cannot be very flexible on my alert rules. I have to wait until the end of this period to see alert status in the system. (I have many states like this actually.) Grafana creates pending, alerting, and ok states according to the records in my database. Is there a method to quickly verify my tests without waiting for this time?


Solution

  • The main problem is that it is fairly expensive to do in a data source agnostic way. The way worked in Bosun is you would select a time range, and then an interval or a number of queries to run.

    Setting both From and To enables testing multiple iterations of the selected alert over time. The number of iterations depends on the setting to the two linked fields Intervals and Step Duration at 3 Changing one changes the other. Intervals will be the number of runs to do even spaced out over the duration of From to To and Step Duration is how much time in minutes should be between intervals. Doing a test over time will populate the Timeline tab 5 which draws a clickable graphic of severity states for each item in the set:

    Bosun plugin

    It would then run all those queries with a pool limiting simultaneous queries. For an interval of say 5 minutes, it would run adjacent 5 minute queries.

    So this would speed up the alert authoring and testing workflow significantly. But it would best be implemented as a job system. This is because with more expensive queries, or range/interval combination that is a fair amount of runs, it may take a minute or so - so having to wait on an open network connection is less ideal.

    So I found I generally used in two modes:

    To tweak a specific alert that had fired at some time To get a general overview of how much the alert rule would trigger for the historical data For the general over, a larger time range is generally wanted, which means more queries if the interval is kept the same. And with a feature like FOR (Pending), you would have to use the same interval it would actually run at.

    So possible, has some limitations, and some care needs to be taken to do it right. But extremely useful in my experience.