Search code examples
prometheusprometheus-alertmanager

Prometheus Trigger Single Alert if set of nodes are down


I have the below targets configured on my prometheus server. All servers are configured in Target1.yml file and Router details are stored in Router.yml file. Each site has unique 4 digit numbers and in this example it is "1234". So like this we have 1000's of sites (in total 10005 nodes)*. Whenever a router goes down or if we have a power outage we are getting 5 alerts in total for each site.

Target1.yml:

node1-1234.example.com
node2-1234.example.com
node3-1234.example.com
node1-4567.example.com
node2-4567.example.com
node3-4567.example.com

Router.yml:

router1-1234.example.com
router2-1234.example.com
router1-4567.example.com
router2-4567.example.com

I am looking for a solution to trigger only router alerts when there is a power outage ignoring the node1/node2/node3. Can you please help on how to achieve this?


Solution

  • Seem you can use https://github.com/prometheus/blackbox_exporter or https://github.com/czerwonk/ping_exporter

    Then create alert rules based on the exported metrics

    Example prometheus rules using blackbox exporter https://awesome-prometheus-alerts.grep.to/rules#blackbox-1