Search code examples
mesosmarathonmesospherehealth-monitoringalerts

Alerts for apps failing Marathon healthchecks


I've been configuring http healthchecks for all my apps in marathon which are working nicely, the trouble is marathon will keep stepping in and restarting a container failing it's healthcheck and I won't know unless I happen to be looking in the Marathon UI.

Is there a way to retrieve all apps that have a failed healthcheck so I can send an email alert or similar?


Solution

  • Marathon exposes information about failing healthcheck with event bus so you can write a simple service that will consume Marathons HealthChecks Event ("eventType": "instance_health_changed_event") and translate it to metric, alert you name it.

    For a reference I can recommend allegro/appcop. This is the service that scales down unhealthy applications. Its code could be easily altered to do what you want.