Search code examples
mongodbautomationmongodb-mmsmongodb-replica-set

MongoDb Ops Manager can't start mongos and shard


I Came by a problem where i have an Ops Manager that suppose to run a MongoDB cluster as an automated cluster.

Suddenly the servers started going down, unexpectedly - while there are no errors in any of the log files indicating on when is the problem.

The Ops Manager gets stuck on the blue label

We are deploying your changes. This might take a few minutes

And it just never goes away.

Because this environment is based on the automation feature, the mms is managing the user on the servers and runs all of the processes from "mongod" which i can't access even as a Root (administrator).

As far as the Ops Manager goes it shows that a shard in a replica set is down although it's live, and thinks that a mongos that is dead is alive.

Has someone got into this situation before and may be able to help ?

Thanks, Eliran.


Solution

  • Problem found: there was an ntp mismatch between the servers in the cluster somehow, so what happened was that the servers were not synced and everytime the ops manager did something it got responses with wrong times and could not use it's time limits.

    After re-configuring all the ntp's back to the same one - everything got back to how it should have been :)