request microservices endpoint health-monitoring

How to avoid circular dependency in multiple API healthchecks?

We have multiple API microservices where we want to implement healthchecks. We also want to include the dependencies of each API when asking for the current health conditions of one (for example the database and used API-s). The problem is that these API-s use each other, thus a health check can lead to an infinite loop due to circular dependency. There are multiple ways to solve this. Ideally what I would want is that:

One API microservice has 1 health check endpoint.
The health check also gives back the health of the API-s outside dependencies.
The endpoint is as minimal as possible. No overengineering something as simple as a healthcheck and ideally with no additional parameters.

An example:

API A: Dependencies are: API B and API C
API B: Dependencies are: API A and API C
API C: Dependencies are: API A and API B

There are two arguments I have found. One is that dependencies should never be included as they are irrelevant. This one is argued here. The other one is that since an API can't perform their tasks without a dependency breaking, that they should definitely be included, argued here. So far I have seen the following solutions/arguments, none of them provide an easy solution for all the points above to happen together.

Don't include the dependencies. This one is valid to some sense and if you monitor all the microservices anyway, why see them also as dependencies in one another. My reasoning is that there can many reasons why access or connection can break between applications. After that the whole API will be unhealthy because it can't perform it's tasks.
Have two endpoints: One which includes the dependencies, and one which not. When you are calling one API from the other, use the one without the dependencies. I can understand this one, however this can overcomplicate something that should be a very simple endpoint. People can also make the main mistake of circular dependency here if they are unfamiliar with the system.
Same as the previous one, but with 1 endpoint and an optional query parameter, like &includeDependencies which is false by default. This seems to be the most reasonable but it can still lead to circular dependency by human error.
Have the endpoints send the already checked endpoints when calling for the health of another one. With this one, I definitely see that it is either overcomplicating, or overengineering.

How would one solve the issue described above?

Solution

I agree with your objections to options 1 and 4. The decision between option 2 or 3 is a matter of taste. I believe option 2 is easier to name and thus it avoids making mistakes. Have 2 endpoints, call them (suggestion):

Availability check
Health check

The availability check consists of a check of the API’s exclusive resources only.

The implementation of the health check additionally contains the invocation of the availability check of each API it directly depends on. Following this principle, there won’t be any circular invocation of checks.

To address your objections:

this can overcomplicate something that should be a very simple endpoint

To make the implementation less complex and mitigate the risk of making mistakes, implement in two phases:

Phase 1) Start with implementing the availability check of any single API. One after the other or in parallel, it does not matter as they don’t depend on each other. With this check, you instantly get the information on which API something is wrong. It's true - each of the health check does not tell you if this API is able to process requests successfully, but for the purpose of monitoring your API's to quickly identify where you need administrative intervention it is big progress. Make this check your primary source of monitoring for now.

Phase 2) Now, that you have all availability checks available, it is easy to implement the health checks - again one after the other or in parallel, This additionally respects the fact that a connection may break between applications as you stated in 1. You can now switch to the health check for making it your primary source for monitoring.

People can also make the main mistake of circular dependency here if they are unfamiliar with the system

You need to find a trade-off between fulfilling all requirements and make the solution as easy as possible. If you are afraid an implementation of a health check invokes the health check of another API (and not the availability check as it is supposed to), then add authentication / authorization to the health check which only allows your monitoring tool to call it. By the way, being able to implement this more easily might be another advantage compared to option 3.