Search code examples
slackslack-apihealth-monitoringhealth-check

Slack health check consistency


We are running a Slack webhook in our company using hooks.slack.com/services/myWebHookId, and we wish to know if its reachable every 30 seconds or so.

According to Slack health status check, I can always go and check and see if Slack is online using its health page (Currently https://status.slack.com/api/v2.0.0/current) and get it's current health.

My question is a consistency question. Is it possible that Slack health page status.slack.com would resolve correctly with a healthy status, while one of it's webhook service, hooks.slack.com, which is the service I am actually using, would be somehow broken, inaccessible, or have a bad DNS record?

The point is, Slack url for health check, is completely different from the web service url we are actually using to send Slack messages.

Is this health check good enough? Will the first always represent the second? Is it reliable enough?

Is it possible the check the webhook service at hooks.slack.com instead?

Any recommendations or best practices?


Solution

  • The answer below was sent to me by the Slack support team. They where also kind to allow me to paste their response here:

    status.slack.com is updated with details manually when we notice major issues. So there could be a window where the status site does not reflect an issue with hooks.slack.com from the time it goes down, until we spot the issue here and update the site. However, hooks.slack.com going down would be huge, and we'd see immediate impact from it. So I'd expect the window in that scenario to be quite small.

    It is far, far more likely that there could be a potential issue with a specific webhook, than the possibility of the entire service having issues. And in that case, the status site would not be updated. If there is an issue with a particular webhook, that should be noticed based on errored responses when trying to use the webhook. In that scenario you can reach out to us and we will work to help resolve the issue. The webhooks are generally extremely reliable, but if you do have concerns you could create a second webhook URL for a channel, and use that as a fallback in your web service if you start receiving errors on your primary webhook.

    Moreover,

    There isn't a specific testing approach you can use for the webhooks. However, you could send a message with a deliberately incorrect payload. This would result in an invalid_payload error and no message actually posted in the channel. Confirming that you get this error correctly when expected could be used as a test. It is possible that there are scenarios that this test would miss so you'd still want to incorporate proper error handling for actual messages, but it should be a fairly reliable approach.