Member of mongo replicaSet still healthy with +100 CPU

I have a standard mongoDB replicaSet with 3 members (in EC2) Everything works fine but from time to time the primary have a CPU > 100. In my opinion this instance is unhealthy so the replica set should choose another primary. But is doesn't happen.

I suppose mongo consider unhealthy only when is not accesible via networking because if I shutdown the instance, the election works fine.

With Cloudwatch I can set an event (stop/restart instance) when the CPU alarm is triggered but I think this is more a workaround than a solution.

So, when mongodb consider a member unhealty?

Solution

It's a bit complicated, but generally a member of a replica set will be considered unhealthy when it stops responding to replica set heartbeats. These are sent every 2 seconds and a response is expected within 10 seconds (reference).

The heartbeats are intentionally lightweight and do not require significant resources to formulate a reply, so even a busy system can remain in a healthy state.

To take a step back for a second, CPU over 100 is not necessarily unhealthy, particularly on a modern multi-core system. Generally it is a better idea to measure the health of the database instance by whether or not you are seeing slow queries or other sort of degradation in performance. By all means track down the source of the spikes in CPU and attempt to address/mitigate them, but generally CPU utilization is not going to be a great barometer of database performance (unless of course all cores are at 100% and the database ends up starved for CPU).

Finally, there is no need to shut down a MongoDB instance or make it otherwise unhealthy to have a new primary elected, instead simply issue the rs.stepDown() command on the primary - it will mark itself ineligible for election and a new primary will be chosen.